Mass digitisation is here now: What it means for Document Analysis Researchers
Place: Sala d’Actes - CVC
Affiliation: PRImA Research Lab, University of Salford, Manchester, UK
Mass digitisation in Libraries and Archives is gathering pace with several projects under way and many more planned. This explosion of activity poses many significant – possibly once in a lifetime - opportunities for researchers in Document Image Analysis (DIA) but also requires a deeper understanding of the cultural, economic and technical issues involved in business decision-making and in the determination of success factors. Having this knowledge, researchers’ efforts can be prioritized and focussed on the most important problems, of which there are many. This talk will give an overview of several issues surrounding mass digitization, drawing from experience in a number of different projects with major content-holders and users. The points of view of the different stakeholders will be discussed and how they translate to technical terms within each of the stages in DIA, from scanning to layout analysis, to recognition, post-correction and crowdsourcing (correction and/or enrichment). The talk will conclude by exploring the very significant role of standardization, datasets and objective evaluation as indispensable tools for DIA research as well as for the business side of mass digitization.