Word Spotting and Recognition in Images from Heterogeneous Sources

November 9, 2018 at 11:30 am by

Place: CVC Sala d’actes



Prof. Dr.- Ing. Gernot A. Fink – Department of Computer science TU Dortmund, Germany
Dr. Alicia Fornés  -Dept. Ciències de la Computació & Centre de Visió per Computador – UAB
Dr. Jon Almazán – Naver Labs Europe – Greenoble, France


Thesis Supervisor:

Dr. Ernest Valveny – Dept. Ciències de la Computació & Centre de Visió per Computador – UAB



Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.


Watch the video presentation