Invited Talks

(in alphabetical order)


  • Speaker: Kevyn Collins-Thompson
  • Title: Enhancing Document Representations Using Analysis of Content Difficulty: Models, Applications, and Insights
  • Abstract: This talk will discuss how enhancing document representations with analysis of language complexity and difficulty can lead to a surprisingly wide range of new applications and insights into how people interact with content in both business and educational settings. Analyzing the difficulty of language has a history going back to the ancient Greeks, who understood that a legal argument or analysis was of little persuasive value if its audience could not understand it. Classic 20th century text readability formulas, such as Flesch-Kincaid, combined statistics like average sentence length and average number of syllables in a text to estimate its readability. However, the limitations of these simple traditional measures, including lack of flexibility for new tasks and populations and robustness for non-traditional documents, has led to a new branch of natural language processing research that has developed richer, more effective data-driven computational models of reading comprehension and text complexity [1]. First I’ll give a brief summary of recent advances in modeling content difficulty and complexity, including my own work on statistical models of readability and deep learning for predicting the informativeness of text. Then I’ll give some examples of insights that derive from applying these methods for creating richer, difficulty-based document representations, using empirical methods ranging from in-lab user studies with eyetracking, to large-scale commercial search interaction data over millions of sessions and Web pages. Finally, I’ll touch on some on-going work and potential future directions in educational scenarios for understanding and supporting learners, toward the goal of high quality, personalized learning experiences.
  • Bio: Kevyn Collins-Thompson is an Associate Professor of Information and Computer Science at the University of Michigan. His research explores models, algorithms, and software systems for optimally connecting people with information, especially toward educational goals. His research has been applied to real-world systems ranging from intelligent tutoring systems to commercial Web search engines. Kevyn has also pioneered techniques for using machine learning to model the reading difficulty of text, for creating robust search and recommender systems that maximize effective results while minimizing the risk of worst-case errors, and for understanding and supporting how people learn language. He received his Ph.D. from the Language Technologies Institute at Carnegie Mellon University and B.Math from the University of Waterloo. Before joining the University of Michigan in 2013, he was a researcher in the Context, Learning, and User Experience for Search (CLUES) group at Microsoft Research. Recent highlights include serving as ACM SIGIR 2018 General Co-Chair, being named co-recipient of Coursera’s Outstanding Educator Award, and recognition as an ACM Distinguished Member for outstanding scientific contributions to computing.
  • Additional info: K. Collins-Thompson. Computational assessment of text readability: a survey of current and future research. In: François, Thomas and Delphine Bernhard (eds.), Recent Advances in Automatic Readability Assessment and Text Simplification. Special issue of International Journal of Applied Linguistics 165:2 (2014). (pp. 97-135)

  • Speaker: Heng Ji
  • Title: What’s in a Chemical Entity?
  • Abstract: Like many scientific fields, new chemistry literature has grown at a staggering pace, with tens of thousands of papers released every month. In our newly created U.S. NSF AI Institute on Molecular Synthesis, we are applying knowledge extraction techniques to automatically construct knowledge bases from scientific literature. The constructed knowledge bases include chemical entities and reactions between entities, and thus they can be used to predict chemical reactions, products, and properties, such as yield, toxicity, and water solubility, for creating new molecules and improving manufacture of target molecules. However, existing information extraction techniques developed for news domain or even biomedical literature will not be directly effective for chemistry literature. One reason is that chemical entities are often complex formula-like names (e.g., 5,6-dihydroxycyclohexa-1,3-diene-1-carboxylic acid). Moreover, many chemicals simply have never been coined with any nomenclature in natural language. Therefore the chemical entity mentions are essentially rare terms that cannot be learned well by only language model. In pursuit of this goal, we propose a novel multimodal embedding approach for constructing a shared common semantic space among multiple data modalities: (1) 2-D images of molecules, representing the underlying molecules or reactions; (2) text-based molecule descriptors; (3) chemical graph structure; (4) natural language definition and description; and (5) structured properties in external databases. I will then present the applications of this common semantic space in building an end-to-end knowledge extraction system for chemistry literature, and using the constructed knowledge base for cross-modal chemical entity retrieval with natural language, and molecule descriptor string generation from molecular diagram images. I’ll present a new benchmark that includes 81 million molecules and 100 chemistry papers fully annotated with a new fine-grained Chemistry ontology. I’ll also talk about remaining challenges and ongoing work on representing chemical reactions.
  • Bio: Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department of University of Illinois at Urbana-Champaign. She is an Amazon Scholar. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as “Young Scientist” and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. The awards she received include “AI’s 10 to Watch” Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014, Bosch Research Award in 2014-2018, and ACL2020 Best Demo Paper Award. She has served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2021.

  • Speaker: Yunyao Li
  • Title: Towards Deep Table Understanding
  • Abstract: Harvesting information from complex documents such as in financial reports and scientific publications is critical to building AI applications for business and research. Such documents are often in PDF format with critical facts and data conveyed in table and graphs. Extracting such information is essential to extract insights from these documents. In IBM Research, we have a rich agenda in this area that we call Deep Document Understanding. In this talk, I will focus on our research on Deep Table Understanding — extracting and understanding tables from PDF documents. I will introduce key challenges in table extraction and understanding and how we address such challenges, from how to acquire data at scale to enable deep neural network models to how to build, customize and evaluate such models. I will also describe how our work enables real-world use cases in domains such as finance and life science. Finally, I will briefly present TableQA, an important downstream task enabled by Deep Table Understanding.
  • Bio: Yunyao Li is a Distinguished Research Staff Member and Senior Research Manager at IBM Research - Almaden where she manages the Scalable Knowledge Intelligence department, focusing on building next-generation enterprise-scale technologies spanning the AI lifecycle of domain ingestion, knowledge representation, creation and refinement with both data-driven and human-in-the-loop approaches. She currently leads the AI Operation in IBM Research - Almaden and Tokyo. She is a member of IBM Academy of Technology and a Master Inventor. Her key contributions span the areas of natural language processing (NLP), data management, information retrieval, and human computer interaction. She is particularly known for her work in scalable NLP, enterprise search, and database usability. She has built systems, developed solutions, and delivered core technologies to over 20 IBM products under brands such as Watson, InfoSphere, and Cognos. She has published over 70 articles and filed or been granted nearly 50 patents. She co-authored the book “Natural Language Data Management and Interfaces.” Her technical contributions have been recognized by prestigious awards within and outside of IBM on regular basis. She is an ACM Distinguished Member. She was a member of the inaugural New Voices program of the American National Academies (1 out of 18 selected nationwide) and represented US young scientists at World Laureates Forum Young Scientists Forum in 2019 (1 of 4 selected nationwide).
    Dr. Li has served the database and NLP communities with distinction. She regularly serves as organizer and senior committee member for top conferences such as ACL, NAACL, SIGMOD, and IJCAI. She championed and co-founded NAACL Industry Track, the first ever industry track in a major NLP conference. She received her PhD and master degrees from the University of Michigan, Ann Arbor and undergraduate degrees from Tsinghua University, Beijing, China.

  • Speaker: Don Metzler
  • Title: Challenges in Enterprise Search and Intelligence
  • Abstract: Building effective enterprise search and intelligence capabilities at scale presents a number of significant challenges. The goal of this talk is to highlight research-focused challenges that are often encountered when developing such systems. The challenges covered in the talk, all of which are backed by real-world use cases, include document understanding, query understanding, and question answering.
  • Bio: Don Metzler is a Senior Staff Software Engineer at Google, where he leads a group focused on problems at the intersection of machine learning, natural language processing, and information retrieval. Previously, he was a Research Assistant Professor at the University of Southern California (USC) and a Senior Research Scientist at Yahoo!. He has served as the Program Chair of the ACM Web Search and Data Mining (WSDM), ACM International Conference on the Theory of Information Retrieval (ICTIR), and the Open Research Areas in Information Retrieval (OAIR) conferences, sat on the editorial boards of the major journals in the field, published over 100 research papers, has been awarded 9 patents, and co-authored the textbook Search Engines: Information Retrieval in Practice.

  • Speaker: Benjamin Van Durme
  • Title: A Case for Statutory Reasoning
  • Abstract: Natural Language Processing is increasingly pursued as an applied Machine Learning problem, with researchers focused on: building large numbers of examples for new tasks, designing models that require less examples, and understanding the errors and capabilities of pretrained representations. Legal NLP is no exception, given large collections of decided cases, there is active work in automated legal reasoning as classification. However, within Legal NLP there is a task offering exciting, real world challenges for language understanding that goes beyond pattern classification: statutory reasoning. For some legal domains, such as US Federal Tax Law, the number of publicly-decided consequential cases each year may be limited (e.g., those involving a large multinational corporation that does not settle out of public view). Further, in reaction to such cases the legal code is regularly modified, closing revealed loopholes. This leads to a naturally occurring task that pairs single examples (a case) with salient prescriptive rules (statutory texts), and where those rules may change between each example. New cases similar to those previously seen may no longer result in the same judgement, requiring any automated solution to rely more explicitly on understanding the salient law. This work is joint with Andrew Blair-Stanek and Nils Holzenberger.
  • Bio: Benjamin Van Durme is an Associate Professor of Computer Science at the Johns Hopkins University, and a researcher at Microsoft Semantic Machines. His work focuses on natural language understanding.

  • Speaker: Cha Zhang
  • Title: Visual Document Intelligence in the Wild
  • Abstract: Recent progress in AI has brought Optical Character Recognition (OCR) and document understanding to a whole new level. In this talk, we will first provide an overview of Microsoft’s latest OCR engine (aka OneOCR), which applies the latest deep learning techniques to recognize mixed printed and handwritten text in over 100 languages, with text lines along arbitrary orientations (even flipped), and with varying degrees of quality and distortion. OneOCR achieves industry leading accuracy on a wide range of application scenarios such as document, invoice, receipt, business card, slide, menu, book cover, poster, GIF/MEME, street view, product label, handwritten note and whiteboard. We then introduce another breakthrough technology developed at Microsoft for document understanding: LayoutLM. LayoutLM bridges computer vision and language, producing state-of-the art results on a number of tasks, including document segmentation, classification, TextVQA, and others. Combining OneOCR and LayoutLM, we created the Form Recognizer API in Azure AI, which extracts text, key-value pairs, tables, and structures from documents in the wild. I will demonstrate some of the capabilities of Form Recognizer, highlight its core component technologies, and explain the roadmap ahead.
  • Bio: Cha Zhang is a Partner Engineering Manager at Microsoft Cloud & AI. He received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1998 and 2000, respectively, both in Electronic Engineering, and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University, in 2004. After graduation, he worked at Microsoft Research for 12 years investigating research topics including multimedia signal processing, computer vision and machine learning. He has published more than 150 technical papers and hold more than 50 U.S. patents. He served as Program Co-Chair for VCIP 2012 and MMSP 2018, and General Co-Chair for ICME 2016. He is a Fellow of the IEEE. Since joining Cloud & AI, he has led teams to ship industry-leading technologies in Microsoft Cognitive Services such as emotion recognition, optical character recognition and document understanding.