Generic Document Visual Question Answering systems
Stage of development
TRL 3-4
Business Sector
Industry 4.0, Tourism, Culture, Music, Audiovisual, Banking & Sureties, Digital industry & Telecommunications
Research Line
Intelligent Reading Systems
Principal Resercher:
Dimosthenis Karatzas
Technology description:
Our technology enables precise extraction of information from documents written in natural language. Document Visual Question Answering (DocVQA) consists of answering a natural language question. The most common pipeline consists of feeding a model with the question, the recognized words from the image, and the image itself to guide the model to focus its attention towards the answer. Moreover, we have extended DocVQA to multipage documents, which is novel and results in a real benefit, reducing the time required to search for certain information in a document with many pages. For example, looking for information in a device’s instruction booklet.
Applications:
- Education
- Retail
- Government and Public Sector
- Travel and Hospitality
IP Transfer:
- Licensing
- SaaS
- Subcontracted Research
- Co-development
Interested in this technology? Contact us!
Technology Transfer & Industry Partnerships Department:
transferencia@cvc.uab.cat