Why do I need to register?

Registering is important for us as it gives us an indication about the possible participation to the competition and also a way to contact potential participants in case we need to communicate useful information about the competition (and only about the competition). You will need to be registered in order to get access to the "Downloads" and "Submit Results" sections.


Am I obliged to participate if I register?

No, registration is only meant to be an expression of interest and it will give you access to the "Downloads" and "Submit Results" section.


Do I have to participate in all of the tasks of the Challenge?

No. You can participate in any and as many of the tasks as you wish to.


I noticed there is another Challenge under the “ICDAR 2011 Robust Reading Competition”. Do I have to participate in this other challenge as well?

No you do not have to. But we would really appreciate it if you did!

Participation in both the “real-scene” and the “born-digital” challenges would give us a way to compare the state of the art between the two domains. At the same time, since both challenges are structured in the same way, if you have a system that can be trained and produce results for one of them, then your system can also be produce results for the other! So, additional effort is minimal. Not to mention that you get a second chance to win:) The “Real Scene” challenge does not feature a segmentation task though, this particular task is only available in the “Born-digital” challenge.


Why two challenges? The Real Scene and the Born-Digital images seem to be very similar!

They do look similar, don't they? But there are crucial differences between the two application domains. Real scene images are captured by high-resolution cameras, and might suffer from illumination problems, obtrusions and shadows. On the other hand born-digital images are designed directly on the computer, text is designed in situ and it might suffer from compression or anti-aliasing artefacts, the fonts used are very small and the resolution is 72dpi as these images are designed to be transfered online. There are more differences to list but the main point here is that algorithms that might work well in one domain will not necessarily work well in the other. The idea of hosting two challenges and addressing both domains in parallel is to try to qualify and quantify the simiarities and the differences and establish the state of the art in both domains.


Are there other similar challenges?

As far as we know this is the first time a Challenge on text extraction from "Born-Digital"  (Web and email) images is organised. Nevertheless, Robust Reading competitions focusing on text extraction from Real Scenes have been organised in the context of ICDAR in 2003 and 2005, led by Simon Lucas; and of course this year our colleagues at DFKI are organising a Challenge on Real Scene images in parallel to ours.


I found a mistake in the ground truth! What can I do?

Please let us know by sending us a note at robust_reading@cvc.uab.es. After the end of the competition the datasets will be archived at the TC10 and TC11 Web sites, and we will correct any mistakes found in the ground truth at that point. We will refrain from publishing updates to the training set during the training period in order not to interfere with the competition process. We really appreciate your help!


Your "Text Localisation" ground truth seems to be at the level of words, but my algorithm is made to locate whole text lines! Are you going to penalise my algorithm during evaluation?

We will do our best not to penalise such behaviour. This was actually one of the few issues reported by authors after past Robust Reading competitions. For the evaluation of this task we have implemented the methodology described in C. Wolf and J.M. Jolion, "Object Count / Area Graphs for the Evaluation of Object Detection and Segmentation Algorithms", International Journal of Document Analysis, vol. 8, no. 4, pp. 280-296, 2006. This methodology addresses the problem of one-to-many and many-to-one correspondences of detected areas in a satisfactory way, and algorithms that are not designed to work at the word level should not be penalised.


I see that not every piece of text in the images is ground truthed, is this an error?

We aim to ground truth every bit of text in the images, there are however cases when we consciously do not include certain text in the ground truth description. These are the following.

  • Characters that are partially cut (see for example the cut line at the bottom of Figure 1a - this is not included in the ground truth). Cut text usually appears when a large image is split to a collage of many smaller ones; traditionally this practice was used to speed up the download of Web pages but it is not encountered a lot nowadays.
  • Text that was not meant to be read but appears in the image accidentally as part of photographic content (see for example the names of the actors on the "The Ugly Truth" DVD in Figure 1b). The text there can only be infered because of the context; it was never meant to be read. On the contrary we do include text which is part of photographic content when it's presence is not accidental in the image (for example the names of the movies in Figure 1b are indeed included in the ground truth).
  • Text that we cannot read in general. This can be because of very low resolution for example, but there are other cases as well. See for example the image of Figure 1c, the word "twitter" seems to be used as the background, behind "follow". This is treated as background and is not included in the ground truth.

In any other case, we probably have made a mistake, so please let us know!

Important Dates
  • 25 February: Webpage is now online! Registrations page is open.
  • 20 March: Training dataset will be available
  • 1 June: Test dataset available
  • 1-3 June: Test period
  • 3 June: Submission of results