Information Extraction from Heterogeneous Handwritten Documents

July 1, 2019 at 10:30 am by

Place: CVC Sala d’actes

Committee:

  • Dr. Veronique Églin (Institut National des Sciences Appliquées de Lyon)
  • Dr. Oriol Ramos-Terrades (CVC- Dept. Ciències de la Computació UAB)
  • Dr. Andreas Fischer (University of Applied Sciences and Arts Western Switzerland)

Thesis Supervisor:

Director: Dr. Alicia Fornés (CVC- Dept. Ciències de la Computació UAB)
Co-director: Dr. Josep Lladós (CVC- Dept. Ciències de la Computació UAB)
Abstract:

In this thesis we explore information Extraction from totally or partially handwritten documents. Basically we are dealing with two different application scenarios. The first scenario are modern highly structured documents like forms. In this kind of documents, the semantic information is  encoded in different fields with a pre-defined location in the document, therefore, information extraction becomes roughly equivalent to transcription. The second application scenario are loosely structured totally handwritten documents, besides transcribing them, we need to assign a semantic label, from a set of known values to the handwritten words.

In both scenarios, transcription is an important part of the information extraction. For that reason in this thesis we present two methods based on Neural Networks, to transcribe handwritten text.In order to tackle the challenge of loosely structured documents, we have produced a benchmark, consisting of a dataset, a defined set of tasks and a metric, that was presented to the community as an international competition. Also, we propose different models based on Convolutional and Recurrent neural networks that are able to transcribe and assign different semantic labels to each handwritten words, that is, able to perform Information Extraction.

Watch the video presentation:
Pictures: