Dr. Jorge Bernal presented yesterday pattern recognition and how computers learn how to read handwritten documents at this year’s Science Party at the Parc de la Ciutadela in Barcelona. The session took place Sunday morning from 12 to 13 and it helped children comprehend the importance of pattern recognition within Computer Science and, specially, within Computer Vision.
CVC was visited today by Primary students within UAB’s CROMA program. This initiative, launched by the Solidarity Foundation of the University(Fundació Autònoma Solidària), gathers primary students from different schools and takes them to explore the University and the research centres that live within it. The idea is to bring children from different environments close to the University and give them an opportunity to learn what is happening behind its walls.
At the Computer Vision Center, the students have learnt some theory about Computer Vision and Artificial Intelligence, have been able to see different demos and have had the opportunity of creating their own pattern sequence. The aim of the visit was for them to comprehend how image detection works, and how pattern recognition helps computers predict future movements.
More information on the global event here (in catalan).
This Friday we had Open Days at the CVC for Engineering UAB students. Not only did researchers explain what Computer Vision is, but also presented the different projects the center works with. They also had a chance to try some of our demos. All the pictures here.
The CVC project ‘Beyond Word spotting: visual context in support of open vocabulary scene text recognition’ led by Dr Dimosthenis Karatzas and Dr Andrew Bagdanov has won a Google Research Award 2016. The project aims to give computers improved reading abilities by teaching them how to read text in images by taking into account the visual information contained within the same image.
Google Research Awards are highly prestigious awards that help carry out projects related to artificial intelligence and machine perception financing a year of work within these areas. According to data provided by Google, in this year’s edition they’ve received more than 876 applications from over 44 countries and 300 universities; granting a total of 143 projects, mainly focused on machine learning, machine perception, networks and systems.
The project proposed by Dr. Karatzas an Dr. Bagdanov has the goal of giving computers a way to comprehend text and visual context in a joint manner within the same image. To this day, computers are able to recognise text on one hand and visual information on the other, separately. However, they don’t always perform optimally, and that’s why researchers are combining modalities (text and visual information in this case). Dr. Karatzas and Dr. Bagdanov want to convert both domains to a common language and therefore give machines the ability to analyse both elements jointly, thus helping computers to recognise the images presented in a more exact and efficient way. Textual information then acts as context for interpreting visual information, and vice versa.
“Let’s set an example”, explains Dr Karatzas, “if you see the image of a yellow post box you can easily guess that what is written on top is “POST” or “MAIL” – in this case, the visual content provides context for recognising the text in the image. Similarly, if you see a shop front, the textual content of the shop sign above can provide useful context for visual understanding”. In fact, google researchers recently discovered that a classifier trained to distinguish different businesses in images ends up learning how to read, as this is a key way to perform this task.
More specifically, Dr. Karatzas and Dr. Bagdanov stated two challenges in their Google Award proposal. The first one, to generate contextualised dictionaries based only on visual scene information. What does this mean? We are talking here of actual dictionaries (with its words and meanings). Therefore, when faced with an image (such as figure 1), the computer, by analysing the visual information can choose the contextualised dictionary that will help him match the word featured (‘trattoria’).
Imagine that you have your Oxford’s Dictionary, with more than 220.000 words, it will be more difficult for the computer to find a match, as it will have to skip through thousands of similar patterned words. But, imagine then, that the computer, by analysing the visual information present in the image (and ignoring the word ‘trattoria’) knows that what it’s looking at is a Restaurant. How? Because there are tables with typical patterned tablecloths, people and families seated with what seems to be food and drinks, a waiter within, etc. It will then go to a sub dictionary titled ‘Restaurants’ (where all words related to this topic would be contained) and will have less trouble in actually finding ‘trattoria’.
Projects such as these will help, in a nearby future, to make computers more effective comprehending the scenes they are presented with, both in video and photography, at a real time. An adequate analysis of images will help improve, not only popular applications such as Street View or Google Maps, but can also be highly useful in terms of surveillance, localization and monitoring in exterior settings. It will most certainly make our daily lives easier: Helping blind people who cannot read, tourists who want to translate words in the street or drivers who can rely on cars that understand street signs.
The Technology of the Library Living Lab was present at the festivity of Sant Cugat’s Giant Book last Sunday.
This festivity unites illustrators and children in an event in which the younger ones invent a short story and the illustrators give life the story with an image. The compilation of all the ilustrated stories then gets binded into a book which is set up for public consulting. This year, the festivity was a little different, and with CVC and the Library Living Lab’s technology, the illustrations were scanned and uploaded to an open account in Instagram and Twitter in order to give the collection an extra value.