Improving generalization for classification and retrieval tasks

CVC Seminar

Abstract:

In this talk we will present recent works that generally aim at improving the generalization of visual representations on both classification and retrieval tasks. We will start from a recent work on supervised pre-training and will present an approach that aims to improve the transferability of encoders learned in a supervised manner, while retaining their state-of-the-art performance on the supervised training task, and introduce two models: t-ReX that achieves a new state of the art for transfer learning and outperforms top methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing better on transfer tasks. We will then present TLDR (TMLR 2022), a dimensionality reduction method for generic input spaces that is porting the recent self-supervised learning framework of Barlow Twins to learning linear encoders that outperform methods like PCA for classification and retrieval. Finally, we will present Grappa (ECCV 2022), a method to efficiently adapt a large pre-trained model to perform better on multiple retrieval tasks jointly using only unlabelled data and with only a small decrease in the zero-shot performance outside those tasks.

Short bio:

Yannis Kalantidis is a senior research scientist at NAVER LABS Europe. He received his PhD on Computer Science from the National Technical University of Athens in 2014 and was a research scientist at Yahoo Research San Francisco and Facebook AI in Menlo Park before joining NAVER LABS Europe in 2020. His research revolves around visual representation and multi-modal learning, self-supervised learning, as well as adaptive systems. He is also passionate about bringing the computer vision community closer to socially impactful tasks, datasets and applications for worldwide impact and co-organized workshops like “Computer Vision for Global Challenges” (CV4GC @ CVPR 2019), “Computer Vision for Agriculture” (CV4A @ ICLR 2020) and “Wikipedia and Multi-Modal & Multi-Lingual Research” (Wiki-M3L @ ICLR 2022) in top-tier AI venues.

Jon Almazan is a research scientist at NAVER LABS Europe. He received his Ph.D. from the Computer Vision Center in the Universitat Autonoma de Barcelona, Spain. Before joining Naver Labs on 2017, he also worked at Xerox Research from 2014. His research interests lie in the fields of computer vision and machine learning, currently with a focus on learning representations for image retrieval, both in a supervised and self-supervised way, object detection, and semantic segmentation.