ACM Multimedia, ECCV and ICPR Internal Seminar 

On 18 October 2022 CVC held its ACM Multimedia, ECCV2022, & ICPR2022 internal seminar. Our researchers presented their papers to their CVC colleagues. 💠ACM Multimedia: 🔸SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision.Danna Xue; Fei Yang; Pei Wang; Luis Herranz; Jinqiu Sun; Yu Zhu; Yanning Zhang 💠ECVV (workshops): 🔸 OCR-IDL: OCR Annotations for Industry Document Library Dataset (Oral). Ali Furkan Biten; Rubèn Tito; Lluis Gomez; Ernest Valveny; Dimosthenis Karatzas. 🔸Doc2Graph: a Task Agnostic Document Understanding Framework … Read more

Continual Learning from Pretrained Models

Abstract: Continual Learning (CL) is a paradigm where an agent learns over time from a stream of data. In this talk, we will discuss how to exploit pretrained models in CL. First, we will talk about “continual pretraining”, a scenario where a large pretrained model is updated over time. The results show that continual pretraining … Read more

Requisitos necesarios para realizar pruebas con vehículos automatizados en carreteras españolas

Abstract: La Dirección General de Tráfico (DGT) aprobó en 2015 una Instrucción sobre la autorización de pruebas o ensayos de investigación realizados con vehículos de conducción automatizada en carreteras españolas. En ella se establecen quiénes son los sujetos que tienen derecho a solicitar la autorización de pruebas y bajo qué condiciones. El objetivo de este … Read more

CVPR and IVS 2021 Presentations

July 9, 2021 at 12:30 pm by CVC researchers Online: Microsoft Teams Exposition of the CVC papers and workshops presented at the Conference on Computer Vision and Pattern Recognition (CVPR) 2021 and the 3D-DLAD workshop at the IEEE – Intelligent Vehicles Symposium (IVS) 2021. Schedule: 12:30 – 12:45 – “Slimmable compressive autoencoders for practical neural image compression” – F. Yang … Read more

Towards better cross-modal learning by Probabilistic embedding and AdamP optimizer

Download the presentation slides Abstract: Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally … Read more

Simple Inference and Generation Using Multimodal Information

Abstract: Can we make computers understand language from just text, or do we need further grounding, such as providing videos and sound? This question has been asked in the NLP community, with much evidence pointing to the fact that even with very large pre-trained language models, the latest technological gem of NLP, we cannot truly … Read more