Vision-Language Contrastive Models: Generalizing Semantic Segmentation and Studying Model Dynamics

CVC Seminar


This presentation is divided into two interconnected parts, exploring the applications and inner workings of Vision-Language contrastive models, with a focus on CLIP-like architectures.

In the first part, we examine the application of these models to semantic segmentation tasks. We begin with a brief overview of our previous work in semantic segmentation and multi-branched architectures. We then explore how Vision-Language contrastive models can be leveraged to enhance segmentation application scope, and how Domain Adaptation methods can help in this process.

The second part of the presentation shifts focus to investigate the fundamental aspects of Vision-Language contrastive models themselves. We present ongoing research that aims to unpack the inner workings of these architectures. This includes preliminary explorations into the nature of the learning process in contrastive settings, and a quick reference to their potential acquisition of social biases present in the training data.

Short bio:

Marcos Escudero-Viñolo is a researcher and university teacher who has co-authored more than 30 papers published in international high-quality peer-reviewed journals and international conferences. His research was funded by national grants and European and national competitive projects from the public and the private sectors, including four projects in which he has been the Principal Investigator. Regarding research topics, he has defined strategies for driving the analysis of vision signals based on regional and contextual constraints, especially for semantic segmentation, scene recognition and medical image analysis. His current research deals with the creation of strategies to provide interpretability, assessment and profiling gates to the knowledge encoded by deep learning visual models. These strategies are used to untap the reasons that preclude these models from being reliable, trustworthy and fair. He is a recurrent reviewer of top journals (e.g., TCSVT, TIP) and conferences (e.g., CVPR, ECCV) and has recently accepted to be evaluator of the Spanish AEI.