Call for Papers

Document Intelligence Workshop

@ KDD 2022

UPDATES

  • August 6: Final versions of the papers are posted!
  • July 22: The workshop Program is up!
  • July 21: Clarified that the workshop this year will be held in-person. All accepted papers must have one author present the work in the workshop in-person.
  • July 10: Paper final version due on August 1, 2022!
  • June 26: Accepted papers announced!
  • June 20: Paper notification is now extended to June 26, 2022!
  • June 17:
    • DI-2022 Keynote Talk is up: William Wang will tell us the latest on Learning to Reason with Text and Tables!
    • Paper reviews are underway! Thank you for all your contributions, our reviewers!
  • May 27:
    • Paper submission deadline is now extended to June 9, 2022!
    • We have two sponsors for the workshop! Thank you Adobe and EY!
  • April 28: DI-2022 site is up!

Abstract

Business documents are central to the operation of all organizations, and they come in all shapes and sizes: project reports, planning documents, technical specifications, financial statements, meeting minutes, legal agreements, contracts, resumes, purchase orders, invoices, and many more. The ability to read, understand and interpret these documents, referred to here as Document Intelligence (DI), is challenging due to their complex formats and structures, internal and external cross references deployed, quality of scans and OCR performed, and many domains of knowledge involved.

While a variety of research has advanced the fundamentals of document understanding, the majority have focused on documents found on the web which fail to capture the complexity of analysis and types of understanding needed across business documents. Realizing the vision of Document Intelligence remains a research challenge that requires a multi-disciplinary perspective spanning not only natural language processing and understanding, but also computer vision, layout understanding, knowledge representation and reasoning, data mining, knowledge discovery, information retrieval, and more – all of which have been profoundly impacted and advanced by deep learning in the last few years. This workshop aims to explore and advance the current state of research and practice, including but not limited to the following topics:

  • Document modeling and representations.
  • Document structure and layout learning and recognition.
  • Cleansing and image enhancement techniques for scanned documents.
  • Information extraction from text and semi-structured documents.
  • Linguistic analysis of business documents.
  • Natural language reasoning and inference.
  • Question answering on business documents.
  • Semantic understanding of business documents.
  • Document search and clustering
  • Handwritten recognition in business documents.
  • Table identification and extraction from business documents.
  • Chart learning and understanding.
  • Domain-specific document understanding.
  • Knowledge representation for business documents.
  • Multilingual document understanding methods and frameworks.
  • Integrated syntax and semantic approaches for document understanding.
  • Transfer learning methods for business document reading and understanding.

In addition to the invited talks and the panel discussion on topics related to Document Intelligence, the workshop program will include paper sessions which provides an opportunity to present peer-reviewed work on the topic related to Document Intelligence.


Sponsors

Thank you for your sponsorships!


Submissions

We are soliciting submissions of short papers in PDF format and formatted according to the Standard ACM Conference Proceedings Template.

Submissions are limited to 4 pages, not including references. Submissions that do not meet the formatting requirements will be rejected without review.

Submissions can be original research contributions, or abstracts of papers previously submitted to top-tier venues, but not currently under review in other venues and not yet published. The research contributions may discuss technical challenges of reading and interpreting business documents and present research results.

The review process is double-blind, and we follow the Conflict of Interest Policy for ACM Publications. The submitted contributions will be peer-reviewed by the Program Committee, and preference will be given to high-quality original and relevant work to the Document Intelligence topics.

It is expected that one of the authors of accepted contributions will register and attend the workshop to present the work in video in-person in the workshop’s Paper Sessions. Accepted contributions will be made publicly available as non-archival reports, allowing future submissions to archival conferences or journals.

Workshop Proceedings

Please note as per the KDD Call for Workshop Proposals:

Note: Workshop papers will not be archived in the ACM Digital Library. However, workshop organizers may set up any archived publication mechanism that best suits their workshop.

DI-2022 accepted papers will not be archived in the main KDD 2022 proceedings. We will instead host the accepted papers on this website (https://aka.ms/di-2022) indefinitely.


Submission URL

Microsoft Research CMT: https://cmt3.research.microsoft.com/DI2022


Important Dates

  • Paper Submission Deadline: 23:59 on Thursday June 2 June 9, 2022 (anywhere on Earth).
  • Paper Notification Date: Monday June 20 Sunday June 26, 2022.
  • Paper Final Version Due: Monday August 1, 2022.
  • Workshop Date: Sunday August 14, 2022 EDT.

Workshop Website

https://document-intelligence.github.io/DI-2022/ or https://aka.ms/di-2022


Contact Information

Email: document-intelligence@outlook.com


Workshop Registration

Workshop registration will be processed with the main KDD 2022 conference: https://kdd.org/kdd2022/


Workshop Organizing Committee


Program Committee Chair


Past Workshops