Training strategies for efficient deep image retrieval

CVC has a new PhD on its record!

Bojana Gajić successfully defended her dissertation on Computer Science on July 13, 2021, and she is now Doctor of Philosophy by the Universitat Autònoma de Barcelona.

Download thesis

What is the thesis about?

In this thesis we focus on image retrieval and re-identification. Training a deep architecture using a ranking loss has become standard for the retrieval and re-identification tasks. We analyze and propose answers on three main issues: 1) What are the most relevant strategies of state-of-the-art methods and how can they be combined in order to obtain a better performance? 2) Can hard negative sampling be performed efficiently (O(1)) while providing improved performance over naïve random sampling? 3) Can recognition and retrieval objectives be achieved by using a recognition-based loss?

First, in chapter 4 we analyze the importance of some state of the art strategies related to the training of a deep model such as image augmentation, backbone architecture and hard triplet mining. We then combine the best strategies to design a simple deep architecture plus a training methodology for effective and high quality person re-identification. We extensively evaluate each design choice, leading to a list of good practices for person re-identification. By following these practices, our approach outperforms the state of the art, including more complex methods with auxiliary components, by large margins on four benchmark datasets. We also provide a qualitative analysis of our trained representation which indicates that, while compact, it is able to capture information from localized and discriminative regions, in a manner akin to an implicit attention mechanism.

Second, in chapter 5 we address the problem of hard negative sampling when training a model with triplet-like loss. In this chapter we present Bag of Negatives (BoN), a fast hard negative mining method, that provides a set, triplet or pair of potentially relevant training samples. BoN is an efficient method that selects a bag of hard negatives based on a novel online hashing strategy. We show the superiority of BoN against state-of-the-art hard negative mining methods in terms of accuracy and training time over three large datasets.

Finally, in chapter 6 we hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by the fact that “a curve dominates in ROC space if and only if it dominates in PR space” [17]. To test this hypothesis, we design an approximated, derivable relaxation of the area under the ROC curve. Despite its simplicity, AUC loss, combined with ResNet50 as a backbone architecture, achieves state-of-the-art results on two large scale publicly available retrieval datasets. Additionally, the AUC loss achieves comparable performance to the more complex, domain specific, state-of-the-art methods for vehicle re-identification.