[IAPR] [TC10] [TC11] [CVC] [UAB
IAPR/ICDAR Award Lecture: Graph-based Representations in Document Analysis
Speaker: Horst Bunke
Institute of Computer Science and Applied Mathematics University of Bern, Switzerland
Graph representation is a powerful formalism that has found widespread applications in science and engineering. Particularly in the field of document analysis, graphs have been used very successfully. Examples include layout analysis, graphics recognition, machine printed character recognition, and handwriting recognition. In this talk we first review some of these applications. Then we report novel approaches to classification, clustering, and related tasks based on graph representations. We will discuss in particular graph kernels and graph embedding in vector spaces and show how they can be used in document analysis.

July 27, Monday - 9.20h


Keynote 1: Mass Digitisation in Digital Libraries: The Experience of the British Library
Speaker: Aly Conteh
Head of Digitisation, British Library London, UK
The British Library is one of the great research libraries of the world, holding over 150 million items in all known languages and formats. The advent of the Internet and the ability to digitise large quantities of text and images and make them available over the Web has transformed ways of working. For the past two decades, the British Library has undertaken a number of focused digitisation initiatives. More recently, we have entered the world of mass digitisation of newspapers and books. Using the experience gained from digitising 25 million pages of books and 4 million pages of newspapers dating from the early 17th century, the approach, challenges and lessons learnt will be presented. For institutions such as the British Library partnering with the document analysis and recognition community is a key part of creating valuable and enduring resources for scholars and the public alike. The partnerships that the Library is currently involved in, which seek to advance the state of the art in mass digitisation of historic text, will be presented.

July 27, Monday - 16.20h


Keynote 2: Ontology-Based Document Understanding on the Semantic Desktop
Speaker: Andreas Dengel
German Research Centre for Artificial Intelligence (DFKI). Kaiserslautern, Germany
A document is the principal mode of preserving and transporting knowledge through time and space. Documents are both sources of information as well as a means for communication. The information they contain can be for example the basis for satisfying potential customers, for defining a common understanding, for assessing facts and interrelationships, for establishing valuable contacts, for planning new products or education in an as yet unknown area. The decreasing half-life-period of knowledge along with the constant growth of highly diverse and specialized information in almost any field ask for intelligent individualized human centered technologies assisting the “knowledge worker” in order to keep track of and cope with documents and information, and to efficiently manage vital business processes. (...)

July 28, Tuesday - 9.00h


Keynote 3: Enterprise Approach to OCR Technology Development
Speaker: Andrey Isaev
Director of Technology Products Department, ABBYY. Moscow, Russia
The talk presents an overview of the approaches necessary to build the best commercial OCR technology. Commercial OCR technology vendors face significant challenges that academy researchers don't have to take care of. The more popular a technology is, the more important it is to keep recognition results consistent among different versions of the technology SDK in all usage scenarios it is being employed, regardless of any improvements in architecture and algorithms. OCR technology has much more quality parameters than just recognition accuracy, and in different usage scenarios different parameters are important. Thus the best technology has to be best in all aspects. Moreover, during past decades industry has redefined the term OCR which now means much more that just character recognition. This talk attempts to discuss these challenges and their influence on the research and development process.

July 28, Tuesday - 16.20h


Keynote 4: Ten editions of ICDAR: Overview and Outlook
Speaker: Guy Lorette
MR IRISA, Université de Rennes 1. Université Européenne de Bretagne, France
After ten editions of ICDAR, it seems to be now time and necessary to do a backtracking for making an overview and a retrospective of what happened during this period of time in our research domain. At first, in this talk the general scientific environment and the evolution of ICDAR features over time will be presented. Then, after a brief historical view and a summary of the main topics of each of the ten ICDAR, the major facts, milestones and results will be put into perspective. Past evolutions, recent achievements and future trends will be analyzed in order to extract some useful know-how and skills. This talk will point out the major advances accomplished in the domain of digital document processing, analysis, recognition and understanding, over a period of almost two decades (1991-2009). These advances will be analyzed in terms of: research avenues, models, methods, standards, languages, systems, hardware, software, experimentations, application domains, industrial products, success and failures, etc. Future prospects, trends and challenges will be highlighted. Finally, some guidelines will also be suggested.

July 29, Wednesday - 9.00h