- August 24: With the last link to the recorded videos published and the postmortem notes sent to the committee members, we’re happy to declare mission accomplished for Document Intelligence Workshop @ #KDD2021!
- We want to thank all the invited speakers, paper reviewers, and KDD workshop organizers for your great contributions! And to all who attended the workshop in real-time or virtually!
- Now on to the next workshop: we’re looking for the next set of organizers to organize DI-2022! We’re all happy to share our experiences and help in any way we can! Please reach out to us!
- August 20: All slides, videos and recordings are now posted on the Program page!
- August 15: NEW Zoom link https://zoom.us/j/95913914492?pwd=RzBtRk9MTXpHdkYwQkptd0Q1d0FiZz09
- August 15: WORKSHOP DELAY for 10 minutes due to technical difficulties (KDD sent us WRONG ZOOM PASSWORD)! Sorry for the delay and stay tuned!
- August 12: Pre-recorded videos of all papers/posters are up on the Program page!
- August 5: Slides of all papers/posters are up on the Program page!
- July 14: Best Paper of DI-2021 is announced! Please join us in congratulating the authors of HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System. The Best Paper will be presented in its own presentation session during the workshop!
- July 11: Invited Talks updated with talk info from Yunyao Li.
- July 10:
- July 6:
- July 2: Invited Talks updated with talk info from Kevyn Collins-Thompson and Benjamin Van Durme.
- July 1: Invited Talks updated with talk info from Cha Zhang.
- June 30: Invited Talks updated with talk info from Don Metzler and Heng Ji.
- June 7: Updated workshop date & time: .
- June 3: In the Invited Talks section:
- May 17: Clarification that workshop papers will not be archived with the main KDD proceedings.
- May 14: Paper submission deadline extended to (anywhere on Earth).
- May 9: Clarified which template for Word authors and which document class for LaTeX authors to use for submission. See Submissions.
- April 22: Added link to the Conflict of Interest Policy for ACM Publications.
- April 22: Added link to the Standard ACM Conference Proceedings Template.
April 15: Paper submission deadline extended to May 17, 2021.
Business documents are central to the operation of all organizations, and they come in all shapes and sizes: project reports, planning documents, technical specifications, financial statements, meeting minutes, legal agreements, contracts, resumes, purchase orders, invoices, and many more. The ability to read, understand and interpret these documents, referred to here as Document Intelligence (DI), is challenging due to their complex formats and structures, internal and external cross references deployed, quality of scans and OCR performed, and many domains of knowledge involved.
While a variety of research has advanced the fundamentals of document understanding, the majority have focused on documents found on the web which fail to capture the complexity of analysis and types of understanding needed across business documents. Realizing the vision of Document Intelligence remains a research challenge that requires a multi-disciplinary perspective spanning not only , but also , , , , , , and more – all of which have been profoundly impacted and advanced by deep learning in the last few years. This workshop aims to explore and advance the current state of research and practice, including but not limited to the following topics:
- Document modeling and representations.
- Document structure and layout learning and recognition.
- Cleansing and image enhancement techniques for scanned documents.
- Information extraction from text and semi-structured documents.
- Linguistic analysis of business documents.
- Natural language reasoning and inference.
- Question answering on business documents.
- Semantic understanding of business documents.
- Document search and clustering
- Handwritten recognition in business documents.
- Table identification and extraction from business documents.
- Chart learning and understanding.
- Domain-specific document understanding.
- Knowledge representation for business documents.
- Multilingual document understanding methods and frameworks.
- Integrated syntax and semantic approaches for document understanding.
- Transfer learning methods for business document reading and understanding.
In addition to the invited talks and the panel discussion on topics related to Document Intelligence, the workshop program will include paper sessions which provides an opportunity to present peer-reviewed work on the topic related to Document Intelligence.
We are soliciting submissions of short papers in PDF format and formatted according to the Standard ACM Conference Proceedings Template.
- Word authors: please use Interim layout.docx/interim sample pdf.
- LaTeX authors: please download LATEX (Version 1.77) and use
Submissions are limited to . Submissions that do not meet the formatting requirements will be rejected without review.
Submissions can be original research contributions, or abstracts of papers previously submitted to top-tier venues, but not currently under review in other venues and not yet published. The research contributions may discuss technical challenges of reading and interpreting business documents and present research results.
The review process is double-blind, and we follow the Conflict of Interest Policy for ACM Publications. The submitted contributions will be peer-reviewed by the Program Committee, and preference will be given to high-quality original and relevant work to the Document Intelligence topics.
It is expected that one of the authors of accepted contributions will register and attend the workshop to present the work in video in the workshop’s Paper Sessions (format to be decided). Accepted contributions will be made publicly available as non-archival reports, allowing future submissions to archival conferences or journals.
Please note as per the KDD Call for Workshop Proposals:
Note: Workshop papers will not be archived in the ACM Digital Library. However, workshop organizers may set up any archived publication mechanism that best suits their workshop.
DI-2021 accepted papers will not be archived in the main KDD 2021 proceedings. We will instead host the accepted papers on this website (https://aka.ms/di-2021) indefinitely.
Microsoft Research CMT: https://cmt3.research.microsoft.com/DI2021
Pre-recording Videos and Upload Videos & Slides
- Invited speakers and presenters please follow the instructions in DI-2021 Self Recording Guidance to pre-record your talks.
- Please upload the video and slides to https://cmt3.research.microsoft.com/DI2021 (same as paper submission site)
- Please upload as supplementary material.
- You can upload up to 3 files.
- The file size limit is 700 MB.
- Accepted file format: pptx (PowerPoint), pdf, mp4 (video)
- Paper Submission Deadline:
May 10, 202123:59 on May 21, 2021 (anywhere on Earth).
- Paper Notification Date: June 10, 2021.
- Paper Final Version Due: July 1, 2021.
- Pre-recorded Video and Slides Upload Due: 23:59 on August 1, 2021 (anywhere on Earth).
- Virtual Workshop Date:
August 14-18,8am-6pm, August 15, 2021 PDT (Sunday).
Workshop registration will be processed with the main KDD 2021 conference: https://kdd.org/kdd2021/
Workshop Organizing Committee
- Douglas Burdick (IBM Research)
- Dave Lewis (Reveal-Brainspace)
- Yijuan (Lucy) Lu (Microsoft Azure AI)
- Hamid Motahari (Macquarie University)
- Sandeep Tata (Google Research)
Program Committee Chair
Benjamin Han (Microsoft Azure AI)