Assisting the training of deep neural networks with applications to computer vision

Assisting the training of deep neural networks with applications to computer vision

Place: Large lecture room.

Affiliation:  Departament de Matemàtica Aplicada i Anàlisi. Universitat de Barcelona, Spain.

This seminar is a summary of Adriana Romero Soriano's PhD thesis work.  

Deep learning has recently been enjoying an increasing popularity due to its success in solving challenging tasks. In particular, deep learning has proven to be effective in a large variety of computer vision tasks, such as image classification, object recognition and image parsing. Contrary to previous research, which required engineered feature representations, designed by experts, in order to succeed, deep learning attempts to learn representation hierarchies automatically from data. More recently, the trend has been to go deeper with representation hierarchies. Learning (very) deep representation hierarchies is a challenging task, which involves the optimization of highly non-convex functions. Therefore, the search for algorithms to ease the learning of (very) deep representation hierarchies from data is extensive and ongoing.

In this thesis, we tackle the challenging problem of easing the learning of (very) deep representation hierarchies. We present a hyper-parameter free, off-the-shelf, simple and fast unsupervised algorithm to discover hidden structure from the input data by enforcing a very strong form of sparsity. We study the applicability and potential of the algorithm to learn representations of varying depth in a handful of applications and domains, highlighting the ability of the algorithm to provide discriminative feature representations that are able to achieve top performance.

Yet, while emphasizing the great value of unsupervised learning methods when labeled data is scarce, the recent industrial success of deep learning has revolved around supervised learning. Supervised learning is currently the focus of many recent research advances, which have shown to excel at many computer vision tasks. Top performing systems often involve very large and deep models, which are not well suited for applications with time or memory limitations. More in line with the current trends, we engage in making top performing models more efficient, by designing very deep and thin models. Since training such very deep models still appears to be a challenging task, we introduce a novel algorithm that guides the training of very thin and deep models by hinting their intermediate representations. Very deep and thin models trained by the proposed algorithm end up extracting feature representations that are comparable or even better performing than the ones extracted by large state-of-the-art models, while compellingly reducing the time and memory consumption of the model.