Image categorization using Fisher kernels of non-iid image models

June 4, 2012 at 5:00 pm by

Place: Large Lecture Room
Affiliation: Researcher in Computer Vision and Machine Learning LEAR Team, INRIA, Grenoble, France


Bag of visual words treat images as an orderless sets of local regions and represent them by visual word frequency histograms. Implicitly, regions are assumed to be identically and independently distributed (iid), which is a very  poor assumption from a modelling  perspective.
In this talk I’ll introduce non-iid models by treating the parameters of  bag-of-word models as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel we encode an image by the gradient of the data log-likelihood w.r.t. hyper-parameters that control priors on the model parameters. In fact,
our models naturally generate transformations similar to taking square-roots, providing an explanation of why such non-linear transformations have proven successful in practice.  Using variational inference we extend the basic model to include Gaussian mixtures over local descriptors, and latent topic models to capture the co-occurrence structure of visual words, both improving performance.  Our models yields state-of-the-art image categorization performance using linear classifiers, without using non-linear kernels, or (approximate) explicit
embeddings thereof (such as by taking the square-root of the features).
This talk is based on our upcoming cvpr’12 paper, which can be found here:


Watch the Video Presentation