Dr. Josep Lladós and the future of mobility at Revolució 4.0 (TV3)

CVC Director Dr. Josep Lladós featured in Revolució 4.0, a magazine show broadcasted at TV3 and hosted by journalist Xantal Llavina.

Last Tuesday’s edition was dedicated to Smart Cities and autonomous driving, in which Dr. Josep Lladós, CVC director, explained how the open source simulator named CARLA is fostering research in the field of autonomous mobility. When asked if autonomous cars will be on the streets in the near future, Dr. Lladós stated that, despite these cars already exist, cities are not yet prepared to host them. Revolució 4.0 is a TV magazine focused on digital transformation and innovation. Broadcasted once a week, it gathers 14 experts and entrepreneurs to discuss about different trending topics in the area of technology.

Watch the full show here (in Catalan): https://www.ccma.cat/tv3/alacarta/revolucio-4-0/viurem-en-megaciutats-al-2050-i-els-cotxes-privats-desapareixeran/video/5961566/


Related articles:

The car in the matrix: CARLA

The future of autonomous cars: understanding the city with the use of videogames

Compitiendo contra la Inteligencia Artificial del coche autónomo

Towards a no driver scenario: autonomous and connected cars at the Computer Vision Center


Kornia Hackathon 2019

The Computer Vision Center is hosting the Kornia Hackathon 2019 on December 14th. The aim of the event is a hands-on in differentiable computer vision with PyTorch session with two of the Kornia project leaders, Edgar Riba and Dmytro Mishkin.

Edgar Riba is one of the main promoters of Kornia, member of the Open Source Vision foundation and OpenCV technical committee member. He is currently finishing his PhD at the UPC on ‘Geometric Computer Vision and Local Features detection’. On the other hand, Dmytro Mishkin, also a Kornia promoter, is currently a PhD student at the Centre for Machine Perception of the Czech Technical University in Prague. Furthermore, Dmytro is the ex-CTO of Clear Research, the co-founder of the Ukrainian Research Group “Szkocka”, co-founder of the Eastern European Computer Vision Conference (EECVC) and Kaggle Master.

The Hackathon will start at 10.00 and finish at 18.00.  The Hackathon is sponsored by Open CV, the OSVF (Open Source Vision Foundation), the Computer Vision Center, Pytorch and AWS. All twitter promotion will be available under the hashtag #HackathonKornia19.

Technical requirements

A solid knowledge of PyTorch, Computer Vision and Deep Learning would be ideal.


Registration is open until December 12th here: https://forms.gle/xRGw1k6njDzthRkB6

About Kornia

Kornia is an open source Computer Vision library for PyTorch. More information on the projecte here: https://kornia.github.io/

Any queries must be directed to edgar.riba@osvf.org

Good news everyone! Training AI in the comprehension of Newspaper image archives

The CVPR 2019 paper presented by PhD student Ali Biten at this year’s annual Computer Vision gathering in Long Beach, USA, has delved into the possibility of giving AI systems the ability to generate interpretations of images by using newspaper images with a caption. Results show that the field of image captioning, although full of new application possibilities, is still a hard nut to crack.


CVC researchers have been pondering on how to develop a current image captioning system, which allows the automatic generation of a fully descriptive text of the content within pictures and photographs. As stated by Ali Biten, first author of the paper, “current systems are highly ineffective, performing merely at a descriptive level, essentially enumerating the objects in the scene and their relations”.

The paper presented at this year’s Computer Vision and Pattern Recognition conference within the framework of two projects led by Senior CVC researchers: the aBSINTHE project, led by Dr. Marçal Rossinyol, and the READS project, led by Dr. Dimosthenis Karatzas. The paper sets a new milestone towards the delivery of an effective context driven entity-aware captioning by using news images. It proposes a novel captioning method, which is able to leverage contextual information and thus produce image captions that effectively describe and interpret the scene.

We have proposed an end-to-end architecture in two phases that allows us to dynamically extend the output dictionary to out-of-vocabulary named entities which keep popping up in news articles,” states Biten. “That is, proper names, locations, dates or even prices; words that would not be compiled in your everyday pocket dictionary”.Furthermore, they have produced the GoodNews dataset, the largest news image captioning database yet, with more than 466.000 image caption-pairs, along with the corresponding metadata.

Information systems trained to see

Now, how do you teach a computer to understand an image? Dr. Dimosthenis Karatzas, co-author of the paper and Associate Director of the Computer Vision Center, shows a broad smile when asked, “that is a good question”, which, of course means “this is going to be a very long explanation”. Let us say the computer can “see” with cameras. Drawing parallels with humans who capture images with their eyes and process them with their brains, cameras take the pictures and computers then analyse them. For a computer, an image is just a set of pixels, which is the visual data that our laptops are supposed to understand.

You have to tell the computer what it is supposed to see in each image”, explains Dr. Karatzas, “the technical term for that is to annotate”. “Image datasets are crucial; depending if you have a robust one or a poor one, the neural network will learn in an accurate way or will not learn very well at all”. When a neural network fails to learn properly, we end up with systems with clear biases. BBC’s article on ‘Racist AI compiles a set of features on the topic.

Furthermore, deep learning has revolutionized the way in which computer engineers teach information systems. Computer vision is the discipline that has both boosted and benefited the most from this revolutionary technique. With the use of neural networks, computers are now deciding what to extract from images in order to understand them.

However, deep learning has a downside; these refined neural networks are incredibly data hungry, needing a huge quantity of images in order to learn effectively. That’s why these new methods are performing really well in facial biometric applications, for example, and so poorly in medical imaging, where obtaining images is a cumbersome process.

In our case, the important problem is not the lack of data, but the lack of an evaluation method: how do you know if a generated caption is “correct” or not. By using newspaper images, the only way we actually have is to compare it to what the journalist wrote”, states Ali.

This method, as appointed by Dr. Karatzas, is “highly restrictive” as “different humans would also give very different captions for the same image”. In the paper, they also evaluated performance by asking human evaluators to judge whether the captions were plausible or not. The result: humans could not tell which one was the artificially generated caption and which one was human generated in 53% of the cases.

Therefore, the model  delivery of a dataset such as the one proposed in the current paper (GoodNews) is a huge step forward. What’s more, the contribution of the paper also provides a model used to produce contextualised captions, being able to distribute its attention between the image and the context.

Neural networks, algorithms that think (or, to be more accurate, process)

A neural network is a set of algorithms that perform a proposed task, in this case of image captioning. Algorithms need an input: the image, a set of instructions (plenty of maths) and an output, a concept, an answer to what we have asked. In Ali Biten’s paper, the goal was for neural networks to give a description of a vast dataset of newspaper images by interpreting the semantic content. This means it can not only make a description of what it can see, but, in the future, it will be able to relate it to other images in other articles containing similar, but different concepts.

Let me give you an example”, says Dr. Rossinyol, IP of the aBSINTHE project, funded by the BBVA Foundation. “I might be looking for images of the employment crisis that hit Spain back in 2008. A normal system will retrieve images that have been classified under the concept ‘crisis’ and will give us pictures of people in the streets queing to get into the state’s job centre, or student strikes asking for better pay rates”. “But”, he adds, “it won’t give you other images such as evictions, and they were, sadly, very common during 2018. Any person, when thinking of the 2008 crisis in Spain will most definitely remember images of people being evicted from their homes. Well, we need the neural network to make that association too. Train it to be able to relate these sets of pictures”.

We understand scenes by building models and employing them to compose stories that explain their perceptual observations”, states Ali Biten, “This capacity of humans is associated with intelligent behaviour”. However, he continues, “Computers can at best perform at the description level and fail to integrate any prior world knowledge in the captions that they produce”. The efforts of this CVPR study are challenging, but have brought scientists a step closer to the production of image captions that will offer plausible interpretations of scenes by the integration of contextual information.

Up to now, currently available image captioning datasets are not fit for developing captioning models with the characteristics previously mentioned. “Current systems provide generic, dry, repetitive and non-contextualized captions”, states Biten. For the task in hand, Ali and colleagues decided to use images illustrating newspaper articles. The reason: the descriptions of the pictures provided by journalists and the contextual information (the texts and accompanying articles) are easily accessible and can be collected with reasonable effort.

Newspapers are an excellent domain for moving towards human-like captions, as they provide readily available contextual information that can be modelled and exploited”, explains Dr. Rossinyol.  To this end, CVC researchers decided to put together GoodNews.

Remember when we talked about the importance of annotating?” asks Dr. Karatzas whilst talking about the article. “Well, news image pictures already give us that annotation, without an extra cost or effort on our part”.

We haven’t solved the issue. That is not what we have proposed here. We have presented a new captioning method that aims to take us a step closer to producing captions that offer a plausible interpretation of the scene, and applied it to the particular case of news image captioning”, summarizes Ali Biten. As CVC researchers see it, they’ve advanced the field of image captioning within computer vision a little further, whilst releasing a news image captioning dataset, the largest to date. GoodNews will help foster the science of computer vision by providing researchers worldwide a useful, highly reliable and smart tool. It’s open source too! If that isn’t good news, what is?

Video of the project (in English):


A. Biten, L. Gómez, M. Rusiñol, D. Karatzas (2019): Good News, Everyone! Context driven entity-aware captioning for news images

Image source: sandid in Pixabay

Project funded by Fundación BBVA

Related articles: 

Digitus II: Releasing The Content Locked In Manuscripts

Defined By The Looks: When Text Meets Visual Information

Dr. Alicia Fornés and Dr. Dimosthenis Karatzas, invited speakers at this year’s Global Forum on AI for Humanity

A gathering of more than 150 researchers in Paris, the Global Forum AI for Humanity has the aim of setting the foundations for a global think tank in AI. This year’s meeting took place at the end of October, with a set of professionals from varied disciplines who analysed future challenges and opportunities of this technology.

Organised by the ‘Programme national de recherché française en IA’ and coordinated by INRIA, the Forum had the patronage of France’s Republic president, Mr. Emmanuel Macron and was an opportunity for professionals from different sectors, such as industry, humanities or the public administration, to gain insight on the current advances of Artificial Intelligence.

Dr. Dimosthenis Karatzas and Dr. Alicia Fornés were two of the more than 150 invited researchers who gave their expertise about this subject. Dr. Alicia Fornés was part of the expert panel dedicated to Digital History, in which she talked about the combination of computer vision and gamification for transcribing historical manuscripts. Dr. Dimosthenis Karatzas, on the other hand, took part in the future of the machines and social interaction & intervention session talking about “Surviving in man-made environments: the case for language and vision”.

The event was held within the framework of the French government’s national strategy for Artificial Intelligence, and was an opportunity to identify a set of guidelines for three main themes. Firstly, the development of an ecosystem of talent; secondly, the dissemination of AI and its transfer to the economy and public administration, and lastly, the implementation of an ethical model which fosters innovation whilst maintaining the protection of fundamental rights.

As stated by the main organisers of the event, “the Global Forum on AI for Humanity is essential to establish a common comprehension of the new perspectives offered by AI, of the problems that emerge and the methods used that will allow us to solve these new challenges”. Furthermore, it aims to “create a set of recommendations for national and international initiatives”.

Dr. Alicia Fornés and Dr. Dimosthenis Karatzas were not the only representatives of Spain in this gathering. Dr. Ramon López de Màntaras, Director of the Institute of Artificial Intelligence, was also invited, along with Dr. Carme Torras, Director of the Institute of Research in Robotics, Dr. Ricardo Baeza Yates, Professor of the UPF, (the three of them also located in Barcelona) and Dr. Enrique Vidal, from the Polytechnic University of Valencia.

Related articles: 

Defined By The Looks: When Text Meets Visual Information

XARXES: Connecting The Lives Of Our Ancestors

CVC at IoT Solutions World Congress 2019

The Computer Vision Center was present at this year’s IoT Solutions World Congress presenting its technology in Computer Vision. A high number of enterprises and companies visited our stand and technological demonstration at the Catalan Pavilion.  

The IoT Solutions World Congress is the largest international event dedicated to IoT (Internet of Things) solutions for industry. This year the event took place from the 29th to the 31st of October at Fira de Barcelona – in the framework of the Barcelona Industry Week – and was attended by more than 16.000 visitors and 350 exhibitors from a wide range of nationalities, with the aim of establishing new partnerships.

The Computer Vision Center highlighted its latest Computer Vision technologies for IoT, with a Smart Market demonstration: a closed circuit that uses new approaches in Deep Learning to detect different packages of grocery items. Without reading a barcode or price, by using computer vision, the system identifies, on site, the item shown from any angle or position, automatically exhibiting the price in the customer’s ticket.

Furthermore, the CVC was also present at the Catalonia AI demo zone with a demonstration on apparent personality detection. This technology has the ability of giving the viewer a profile of apparent personality after just 15 seconds of video. Both demonstrations were highly succesful giving CVC broad recognition within the IOT industry world.


Have a look at our IoT Solutions World Congress Moment in Twitter: https://twitter.com/i/moments/1191294755721875456