Modeling, estimation and evaluation of intrinsic images considering color information

Modeling, estimation and evaluation of intrinsic images considering color information

Thesis Directors: Dr. Robert Benavente and Dr. Olivier Penacchio.

Abstract:

Image values are the result of a combination of visual information coming from multiple sources. Recovering information from the multiple factors that produced an image seems a hard and ill-posed problem. However, it is important to observe that humans develop the ability to interpret images and recognize and isolate specific physical properties of the scene.

Images describing a single physical characteristic of an scene are called intrinsic images. These images would benefit most computer vision tasks which are often affected by the multiple complex effects that are usually found in natural images (e.g. cast shadows, specularities, interreflections...). In this thesis we will analyze the problem of intrinsic image estimation from different perspectives, including the theoretical formulation of the problem, the visual cues that can be used to estimate the intrinsic components and the evaluation mechanisms of the problem. We first give a brief introduction on the background and the nature of the problem of intrinsic image estimation and some of its closely related topics. Then, we present an exhaustive review of the literature of intrinsic images in the field of computer vision, giving a comprehensive and organized description of the existing techniques for intrinsic image estimation. In our review we analyze how common simplifying assumptions about the world have modified the formulation of the problem of intrinsic image decomposition and also how different information cues based on regularities about the scenes and images have been used to estimate intrinsic images. We also examine the evaluation mechanisms that have been used so far in this problem. We analyze the existing databases and metrics, discuss the evolution of the problem and identify the recent trends in the field. Color information has been frequently ignored in the field of computer vision.However, as it can be seen in our review, color has proved to be extremely useful in the estimation of intrinsic images. In this work we present a method for intrinsic image decomposition which estimates the intrinsic reflectance and shading components of a single input image using observations from two different color attributes combined in a probabilistic framework. One of them, based on the semantic description of color used by humans, provides a sparse description of reflectances in an image. The other, based on an analysis of color distributions in the histogram space which connects local maxima, gives us a consistent description of surfaces sharing the same reflectance, providing stability of color-names in shadowed or near highlight regions of the image. Moreover, most methods for intrinsic image decomposition have usually assumed “white light” in the scenes and have completely ignored the effect of camera sensors in images. However, both factors strongly influence the resulting image values during the acquisition process. In this work we analyze the theoretical formulation underlying the decomposition problem and propose a generalized framework where we model the effects of both the camera sensors and the color of the illuminant. In this novel formulation we introduce a new reflectance component, called absolute reflectance, which is invariant to both effects. Furthermore, we demonstrate that any knowledge of the color of the illuminant or the camera sensors can be used to improve the reflectance estimates of different existing methods for intrinsic image decomposition. We also show that existing methods, which usually ignore the color of the illuminant and the camera sensors, include large errors in their reflectance estimates. Finally, we analyze the evaluation mechanisms of intrinsic images, which have continuously evolved during the last decade. Although multiple datasets have been presented, building these datasets has proved to be a challenging problem in itself and current ground truth collections present multiple drawbacks, such as the small number and diversity of scenes or the lack of ground truth information for specific intrinsic components (the depth or surface orientation of the objects in the image, the color and direction of the illuminant, etc.). In this thesis we present two datasets for intrinsic image evaluation. One is a calibrated dataset which includes ground truth information about the illuminant of the scene and the camera sensors. This dataset is used in this work to experimentally validate the theoretical framework for intrinsic image decomposition proposed in this thesis. The second dataset uses synthetic data and contains both simple objects and complex scenes under different illumination conditions. In this work we demonstrate that it is possible to build large and realistic datasets for intrinsic image evaluation using computer graphics software and rendering engines.