IAPR/ICDAR Award Lecture: Graph-based Representations in Document Analysis |
Speaker: Horst Bunke |
Institute of Computer Science and Applied Mathematics University of Bern, Switzerland |
Graph representation is a powerful formalism that has found widespread applications in science and engineering. Particularly in the field of document analysis, graphs have been used very successfully. Examples include layout analysis, graphics recognition, machine printed character recognition, and handwriting recognition. In this talk we first review some of these applications. Then we report novel approaches to classification, clustering, and related tasks based on graph representations. We will discuss in particular graph kernels and graph embedding in vector spaces and show how they can be used in document analysis. |
July 27, Monday - 9.20h |
|
Keynote 1: Mass Digitisation in Digital Libraries: The Experience of the British Library |
Speaker: Aly Conteh |
Head of Digitisation, British Library London, UK |
The British Library is one of the great research libraries of the world, holding over 150 million items in all known languages and formats. The advent of the Internet and the ability to digitise large quantities of text and images and make them available over the Web has transformed ways of working. For the past two decades, the British Library has undertaken a number of focused digitisation initiatives. More recently, we have entered the world of mass digitisation of newspapers and books. Using the experience gained from digitising 25 million pages of books and 4 million pages of newspapers dating from the early 17th century, the approach, challenges and lessons learnt will be presented. For institutions such as the British Library partnering with the document analysis and recognition community is a key part of creating valuable and enduring resources for scholars and the public alike. The partnerships that the Library is currently involved in, which seek to advance the state of the art in mass digitisation of historic text, will be presented. |
July 27, Monday - 16.20h |
|
Keynote 2: Ontology-Based Document Understanding on the Semantic Desktop |
Speaker: Andreas Dengel |
German Research Centre for Artificial Intelligence (DFKI). Kaiserslautern, Germany |
A document is the principal mode of preserving and transporting
knowledge through time and space. Documents are both sources of
information as well as a means for communication. The information they
contain can be for example the basis for satisfying potential customers, for
defining a common understanding, for assessing facts and interrelationships,
for establishing valuable contacts, for planning new products or education in
an as yet unknown area. The decreasing half-life-period of knowledge along
with the constant growth of highly diverse and specialized information in
almost any field ask for intelligent individualized human centered technologies
assisting the “knowledge worker” in order to keep track of and cope with
documents and information, and to efficiently manage vital business
processes. (...) |
July 28, Tuesday - 9.00h |
|
Keynote 3: Enterprise Approach to OCR Technology Development |
Speaker: Andrey Isaev |
Director of Technology Products Department, ABBYY. Moscow, Russia |
The talk presents an overview of the approaches necessary to build
the best commercial OCR technology. Commercial OCR technology vendors
face significant challenges that academy researchers don't have to take care
of. The more popular a technology is, the more important it is to keep
recognition results consistent among different versions of the technology SDK
in all usage scenarios it is being employed, regardless of any improvements in
architecture and algorithms. OCR technology has much more quality
parameters than just recognition accuracy, and in different usage scenarios
different parameters are important. Thus the best technology has to be best
in all aspects. Moreover, during past decades industry has redefined the term
OCR which now means much more that just character recognition. This talk
attempts to discuss these challenges and their influence on the research and
development process. |
July 28, Tuesday - 16.20h |
|
Keynote 4: Ten editions of ICDAR: Overview and Outlook |
Speaker: Guy Lorette |
MR IRISA, Université de Rennes 1. Université Européenne de Bretagne, France |
After ten editions of ICDAR, it seems to be now time and necessary
to do a backtracking for making an overview and a retrospective of what
happened during this period of time in our research domain.
At first, in this talk the general scientific environment and the evolution of
ICDAR features over time will be presented. Then, after a brief historical view
and a summary of the main topics of each of the ten ICDAR, the major facts,
milestones and results will be put into perspective. Past evolutions, recent
achievements and future trends will be analyzed in order to extract some
useful know-how and skills.
This talk will point out the major advances accomplished in the domain of
digital document processing, analysis, recognition and understanding, over a
period of almost two decades (1991-2009). These advances will be analyzed in
terms of: research avenues, models, methods, standards, languages, systems,
hardware, software, experimentations, application domains, industrial
products, success and failures, etc. Future prospects, trends and challenges
will be highlighted. Finally, some guidelines will also be suggested. |
July 29, Wednesday - 9.00h
|
|
|
|