Generic Document Visual Question Answering systems

Stage of development

TRL 3-4

Business Sector

Industry 4.0, Tourism, Culture, Music, Audiovisual, Banking & Sureties, Digital industry & Telecommunications

Research Line

Intelligent Reading Systems

Principal Resercher:

Dimosthenis Karatzas

Technology description:

Our technology enables precise extraction of information from documents written in natural language. Document Visual Question Answering (DocVQA) consists of answering a natural language question. The most common pipeline consists of feeding a model with the question, the recognized words from the image, and the image itself to guide the model to focus its attention towards the answer. Moreover, we have extended DocVQA to multipage documents, which is novel and results in a real benefit, reducing the time required to search for certain information in a document with many pages. For example, looking for information in a device’s instruction booklet.

Applications:

IP Transfer:

Interested in this technology? Contact us!

Technology Transfer & Industry Partnerships Department:

transferencia@cvc.uab.cat