Place: Large Lecture Room
Affiliation: Computer VIsion Centre and Dep. of Computer Science, UAB.
Day by day, the ability to automatically detect and recognize objects in unconstrained images is becoming more and more important. From security systems and robots, to smart phones and augmented reality, every intelligent device needs to know the semantic meaning of an image. This thesis tackles the problem of fast object detection based on template models. Searching for an object in an image is the procedure of evaluating the similarity between the template model and every possible image location and scale. Here we argue that using a template model representation based on a multiple resolution hierarchy is an optimal choice that can lead to excellent detection accuracy and fast computation. As the search of the object is implicitly effectuated at multiple image resolutions to detect objects at multiple scales, using also a template model with multiple resolutions permits an improved model representation almost without any additional computational cost. Also, the hierarchy of multiple resolutions naturally adapts to a search over image resolutions, from coarse to fine. This leads to a double speed-up due to: an initially reduced set of coarse locations where to search for the object; a lower cost of evaluating the template model. The search over resolutions can be effectuated by using a cascade of multiresolution classifiers, which saves computation by early stopping the search at coarse level when finding easy negative examples. An alternative approach is to locally but uniformly selecting the most promising detection locations at coarse level and, then, iteratively propagate only these ones to the finer resolutions, saving computation. This procedure, that we call coarse-to-fine search, has a speed-up similar to the multiresolution cascade, but a computational time independent of the image content. The coarseto-fine search is then extended to deformable parts models. In this approach, while increasing the model resolution, the hierarchy of models is recursively separated into deformable subparts. In this way, each part can be aligned to the object in the image, producing a better representation and, therefore, an improved detection accuracy with still a reduced computational cost. We validate the different multiresolution models on several commonly used datasets, showing state-of-the-art results with a reduced computational cost. Finally, we specialize the multiresolution deformable model to the challenging task of pedestrian detection from moving vehicles, that requires both high accuracy and real-time performance. We show that the overall quality of our model is superior to previous works and it can lead to the first reliable pedestrian detection based only on images.