Deep learning based architectures for cross-domain image processing

Abstract: Human vision is restricted to the visual-optical spectrum. Machine vision is not. Cameras sensitive to diverse infrared spectral bands can improve the capacities of autonomous systems and provide a comprehensive view. Relevant scene content can be made visible, particularly in situations when sensors of other modalities, such as a visual-optical camera, require a source … Read more

Categories Phd

Document Image Enhancement and Recognition in Low Resource Scenarios: Application to Ciphers and Handwritten Text

Abstract: In this thesis, we propose different contributions with the goal of enhancing and recognizing historical handwritten document images, especially the ones with rare scripts, such as cipher documents. In the first part, some effective end-to-end models for Document Image Enhancement (DIE) using deep learning models were presented. First, Generative Adversarial Networks (cGAN) for different … Read more

Categories Phd

Leveraging Scene Text Information for Image Interpretation

Abstract: Until recently, most computer vision models remained illiterate, largely ignoring the semantically rich and explicit information contained as scene text. Recent progress in scene text detection and recognition has recently allowed exploring its role in a diverse set of open computer vision problems, e.g. image classification, image-text retrieval, image captioning, and visual question answering … Read more

Categories Phd

A Bitter-Sweet Symphony on Vision and Language: Bias and World Knowledge

Abstract: Vision and Language are broadly regarded as cornerstones of intelligence. Even though language and vision have different aims – language having the purpose of communication, transmission of information and vision having the purpose of constructing mental representations around us to navigate and interact with objects – they cooperate and depend on one another in … Read more

Categories Phd

Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods

Abstract: The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process very time-consuming and prone to errors. Consequently, automatic transcription systems for musical documents represent interesting tools. Document analysis is the … Read more

Categories Phd

Deep Metric Learning for re-identification, tracking and hierarchical novelty detection

Abstract: Metric learning refers to the problem in machine learning of learning a distance or similarity measurement to compare data. In particular, deep metric learning involves learning a representation, also referred to as embedding, such that in the embedding space data samples can be compared based on the distance, directly providing a similarity measure. This … Read more

Categories Phd

Self-supervised learning for image-to-image translation in the small data regime

Abstract: The mass irruption of Deep Convolutional Neural Networks (CNNs) in computer vision since 2012 led to a dominance of the image understanding paradigm consisting in an end-to-end fully supervised learning workflow over large-scale annotated datasets. This approach proved to be extremely useful at solving a myriad of classic and new computer vision tasks with … Read more

Categories Phd

Continual learning for hierarchical classification, few-shot recognition, and multi-modal learning

Abstract: Deep learning has drastically changed computer vision in the past decades and achieved great success in many applications, such as image classification, retrieval, detection, and segmentation thanks to the emergence of neural networks. Typically, for most applications, these networks are presented with examples from all tasks they are expected to perform. However, for many … Read more

Categories Phd

Towards Efficient and Robust Convolutional Neural Networks for Single Image Super-Resolution

Abstract: Single image super-resolution (SISR) is an important task in image processing which aims to enhance the resolution of imaging systems. Recently, SISR has witnessed great strides with the rapid development of deep learning. Recent advances in SISR are mostly devoted to designing deeper and wider networks to enhance their representation learning capacity. However, as … Read more

Categories Phd

Monocular Depth Estimation for Autonomous Driving

Abstract: 3D geometric information is essential for on-board perception in autonomous driving and driver assistance. Autonomous vehicles (AVs) are equipped with calibrated sensor suites. As part of these suites, we can find LiDARs, which are expensive active sensors in charge of providing the 3D geometric information. Depending on the operational conditions for the AV, calibrated stereo rigs … Read more

Categories Phd