Place: Large lecture room.
Affiliation: Computer Vision group at Xerox XRCE. Grenoble, France.
In this talk I will present some recent work carried out at the Computer Vision group at Xerox XRCE.
In the first part of the talk I will present some interesting connections between Fisher kernels and Convolutional Neural Networks (CNN).
Based on these connections, we derive a method to extract Fisher-vector-like gradient features from CNNs,
which show consistent improvements over the more standard features extracted only from the forward activations of the CNN.
In the second part of the talk I will describe some work on learning deep representations for word images.
However, contrary to traditional approaches, we are not interested in encoding information about the transcription of the word.
Instead, we are interested in learning representations that directly capture the semantic meaning of the words.
We find that, by leveraging information of lexical resources such as WordNet at training time,
one can learn a CNN that goes directly from image pixels to a latent representation that encodes word semantics, with no need to explicitly transcribe the word image at any time.