Theory and Algorithms on the Median Graph. Application to Graph-basedPlace: Large Lecture Room, Computer Vision Center Affiliation: Computer Vision Center, UAB
Given a set of objects, the generic concept of median is defined as the object with the smallest sum of distances to all the objects in the set. It has been often used as a good alternative to obtain a representative of the set. In structural pattern recognition, graphs are normally used to represent structured objects. In the graph domain, the concept analogous to the median is known as the median graph. By extension, it has the same potential applications as the generic median in order to be used as the representative of a set of graphs. Despite its simple definition and potential applications, its computation has been shown as an extremely complex task. All the existing algorithms can only deal with small sets of graphs, and its application has been constrained in most cases to the use of synthetic data with no real meaning. Thus, it has mainly remained in the box of the theoretical concepts. The main objective of this work is to further investigate both the theory and the algorithmic underlying the concept of the median graph with the final objective to extend its applicability and bring all its potential to the world of real applications. To this end, new theory and new algorithms for its computation are reported. From a theoretical point of view, this thesis makes two main contributions. On one hand, the new concept of spectral median graph. On the other hand, we show that some of the existing theoretical properties of the median graph can be improved under some specific conditions. In addition to these theoretical contributions, we propose five new ways to compute the median graph. One of them is a direct consequence of the spectral median graph concept. In addition, we provide two new algorithms based on the new theoretical properties. Finally, we present a novel technique for the median graph computation based on graph embedding into vector spaces. With this technique two more new algorithms are presented. The experimental evaluation of the proposed methods on one semi-artificial and two real-world datasets, representing graphical symbols, molecules and webpages, shows that these methods are much more efficient than the existing ones. In addition, we have been able to proof for the first time that the median graph can be a good representative of a class in large datasets. We have performed some classification and clustering experiments that validate this hypothesis and permit to foresee a successful application of the median graph to a variety of machine learning algorithms.