Improving generalization for classification and retrieval tasks

Abstract: In this talk we will present recent works that generally aim at improving the generalization of visual representations on both classification and retrieval tasks. We will start from a recent work on supervised pre-training and will present an approach that aims to improve the transferability of encoders learned in a supervised manner, while retaining … Read more

Deep Learning for Document, Scene and Satellite Images Processing and Recognition

Abstract: Deep learning applications have been thriving over the last decade in many different domains, including image processing and recognition. The driver for the vibrant development of deep learning have been the availability of abundant data. This talk reviews the main results of our research activities carried out over the last few years. During this … Read more

From scans to information: end-to-end information extraction from documents

Abstract: With the advancement of Transformers, especially as far as computer vision is concerned, we are starting to apply end-to-end neural networks, without OCR or other pre-/postprocessing techniques, to the challenges of document understanding and information extraction. I will present developments in this area and discuss potential problems from both theoretical (handling longer sequences) and … Read more

Calibrated Fine-Grained Recognition and Retrieval

Abstract: In the last decade, many areas of computer vision have progressed to a level supporting reliable, and sometimes impressive, applications. I wil talk about two such domains, fine-grained recognition and visual retrieval. In the fine-grained recognition, I’ll discuss the issue of prior probability shift, classifier calibration and the choice of loss functions driven by … Read more

ACM Multimedia, ECCV and ICPR Internal Seminar 

On 18 October 2022 CVC held its ACM Multimedia, ECCV2022, & ICPR2022 internal seminar. Our researchers presented their papers to their CVC colleagues. 💠ACM Multimedia: 🔸SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision.Danna Xue; Fei Yang; Pei Wang; Luis Herranz; Jinqiu Sun; Yu Zhu; Yanning Zhang 💠ECVV (workshops): 🔸 OCR-IDL: OCR Annotations for Industry Document Library Dataset (Oral). Ali Furkan Biten; Rubèn Tito; Lluis Gomez; Ernest Valveny; Dimosthenis Karatzas. 🔸Doc2Graph: a Task Agnostic Document Understanding Framework … Read more

Continual Learning from Pretrained Models

Abstract: Continual Learning (CL) is a paradigm where an agent learns over time from a stream of data. In this talk, we will discuss how to exploit pretrained models in CL. First, we will talk about “continual pretraining”, a scenario where a large pretrained model is updated over time. The results show that continual pretraining … Read more

Requisitos necesarios para realizar pruebas con vehículos automatizados en carreteras españolas

Abstract: La Dirección General de Tráfico (DGT) aprobó en 2015 una Instrucción sobre la autorización de pruebas o ensayos de investigación realizados con vehículos de conducción automatizada en carreteras españolas. En ella se establecen quiénes son los sujetos que tienen derecho a solicitar la autorización de pruebas y bajo qué condiciones. El objetivo de este … Read more

CVPR and IVS 2021 Presentations

July 9, 2021 at 12:30 pm by CVC researchers Online: Microsoft Teams Exposition of the CVC papers and workshops presented at the Conference on Computer Vision and Pattern Recognition (CVPR) 2021 and the 3D-DLAD workshop at the IEEE – Intelligent Vehicles Symposium (IVS) 2021. Schedule: 12:30 – 12:45 – “Slimmable compressive autoencoders for practical neural image compression” – F. Yang … Read more

Towards better cross-modal learning by Probabilistic embedding and AdamP optimizer

Download the presentation slides Abstract: Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally … Read more

Simple Inference and Generation Using Multimodal Information

Abstract: Can we make computers understand language from just text, or do we need further grounding, such as providing videos and sound? This question has been asked in the NLP community, with much evidence pointing to the fact that even with very large pre-trained language models, the latest technological gem of NLP, we cannot truly … Read more