Document Image Enhancement & Recognition in Low Resource Scenarios: Application to Ciphers and Handwritten Text

CVC has a new PhD on its record!

Pioneer Award 2023

Mohamed Ali Souibgui successfully defended his dissertation on Computer Science on December 01, 2022, and he is now Doctor of Philosophy by the Universitat Autònoma de Barcelona. Dr. Souibgui was recognized with a Pioneer Award for the commercial orientation of his thesis by the CERCA Institute – Centres de Recerca de Catalunya – on December 20, 2023.

Download thesis

What is the thesis about?

In this thesis, we propose different contributions with the goal of enhancing and recognizing historical handwritten document images, especially the ones with rare scripts, such as cipher documents.

In the first part, some effective end-to-end models for Document Image Enhancement (DIE) using deep learning models were presented. First, Generative Adversarial Networks (cGAN) for different tasks (document clean-up, binarization, deblurring, and watermark removal) were explored. Next, we further improve the results by recovering the degraded document images into a clean and readable form by integrating a text recognizer into the cGAN model to promote the generated document image to be more readable. Afterwards, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion.

The second part of the thesis addresses Handwritten Text Recognition (HTR) in low resource scenarios, i.e. when only few labeled training data is available. We propose novel methods for recognizing ciphers with rare scripts. First, a few-shot object detection based method was proposed. Then, we incorporate a progressive learning strategy that automatically assigns pseudo-labels to a set of unlabeled data to reduce the human labor of annotating few pages while maintaining the good performance of the model. Secondly, a data generation technique based on Bayesian Program Learning (BPL) is proposed to overcome the lack of data in such rare scripts. Thirdly, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE). This latter self-supervised model is designed to tackle two tasks, text recognition and document image enhancement. The proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time, it requires substantially fewer data samples to converge.

In the third part of the thesis we analyze, from the user perspective, the usage of HTR systems in low resource scenarios. This contrasts with the usual research on HTR, which often focuses on technical aspects only and rarely devotes efforts on implementing software tools for scholars in Humanities.

Keywords: Computer Vision, Historical Document Analysis, Document Image Enhancement, Handwritten Text Recognition, Few-shot learning, Generative Adversarial Networks, Transformers.