Federico Malato & Reinforcement learning research at CVC

Federico Malato & Reinforcement learning research at CVC

Federico Malato has been working at CVC for the last 3 months on a very promising topic, Reinforcement Learning (RL). He has spent his time supervised by our Associate director Dimosthenis Karatzas, and despite the fact that his future is promising (he is starting a PhD in Finland in September) his adventure at CVC is ending for now.

At the request of his research partners, we have made a short interview and collected some questions.

> Who is Federico Malato?

It’s a 25 years old guy from Prato, Italy. I studied Computer Engineering at University of Florence, Italy for both my Bsc and Msc degrees. In my Master’s degree I specialized in Machine Learning and got particularly interested in Reinforcement Learning. While developing my Msc thesis in collaboration with UAB, Barcelona, I was proposed by Prof. Dimosthenis Karatzas to spend June/July 2021 in UAB for an internship that could help me to grow both as a person and as a scientist. From September 2021 I’ll be starting my PhD studies at University of Eastern Finland, Joensuu, focusing on Inverse Reinforcement Learning and the reward function shaping problem for complex tasks.

> What is your research about?

I’m currently applying Reinforcement Learning to images and their related “saliency maps”, that is a version of an image where “potential good information” is highlighted. The goal of our project will be, in the future, to show an image to an agent and ask it to answer questions related to the image in a correct and extensive way; for example, suppose there are some cars in an image. We might ask an agent “how many yellow cars are in the image?” or “is there a red car?”, and the agent should find a way to actually answer our question by looking information all around the image. We call this problem “Visual Question Answering (VQA) problem” and it’s very hard to solve. More specifically, we want to enable multi-scale exploration of a saliency map in order to address the VQA problem for images.

> Why is this important or a novelty?

Reinforcement Learning is a very different approach to Machine Learning, with respect to other well-known approaches such as Supervised or Unsupervised Learning. It focuses on “control problems”, that is, problems in which you are required to take “decisions” based on your surroundings. As human beings, we face hundreds of these problems each day, from “should I have pizza or hamburger for dinner?” to the most serious problem like “The car in front of me has started to brake suddenly, what should I do to avoid a collision?”. Control problems are usually very hard to define and are also very hard to solve optimally, making them interesting to study and address. They are probably the closest way to try to model a human mind, and there are some techniques (such as Behavioural Cloning or Inverse Reinforcement Learning) that try to mimic the human reasoning process and transfer them to a machine. While these claims are quite strong and we are still far away from actually mimicking it, the idea of having a way to do that amazes me to the core.

> What are the different applications of reinforcement learning in computer vision?

The most notorious application of RL in computer vision are self-driving car: if we aim for a car that is capable of taking at each moment an optimal decision, having approaches that are defined exactly for decision problems comes in very handy. Also, with RL techniques you can manage to design “intelligent” robots that could help humans in a wide variety of tasks, or even replace them when things get potentially harmful for people (silly example, working on a mine). For all these tasks, however, vision is the fundamental requirement, as it allows an agent to perceive its surroundings and take a decision accordingly.

> What are the advantages of using this approach?

Reinforcement Learning has a lot of advantages and, alas, a lot of disadvantages as well. I think that the best advantage is that you can model complex behaviors in completely simulated environments, hence avoiding risky real situations. Just think of CARLA: in order to teach it how to drive, researchers have defined a video game that could allow it to make all the mistakes it wants without physically hurting anyone or damaging anything. In this field, video games are very popular to model policies that might then be transferred to agents that interact with the real world. On the other side, RL is very task-specific and terribly sample inefficient. This means that agents learn very slowly. Plus, convergence of the method is never ensured, meaning that we don’t even know that the agent will ever learn something useful to us. To tackle those catches, a good amount of preliminary studies and a perfect definition of the problem are the key to get some results.

> What is the motivation behind using reinforcement learning?

There are a lot of possible “ideal motivations” and you could argue about those for a year. For me, the motivation behind Reinforcement Learning is that it’s our best shot at designing intelligent agents that are capable to adapt to the real world and cooperate with humans to improve our quality of life: for example, in a complicated surgery a machine is going to be much more precise than our best surgeon, and could therefore assist it. Plus, a machine is less likely to get tired and its performance doesn’t depend on tiredness at all. Speaking in a more realistic and technical way, I am extremely fascinated by the “decision” feature, which is very subtle, yet extremely powerful for addressing most of the problems in AI.

> How was your time at CVC?

It has surpassed all of my most optimistic expectations. I found a wonderful working environment with amazing people. I really couldn’t imagine that this experience would have been so positive both work-wise and life-wise.

Now, some specific questions from your colleagues…

> Why do you use reinforcement learning for detection or segmentation?

The idea comes from my Msc thesis project and comes from my supervisor, prof. Andrew Bagdanov. If you think about how a human recognizes things in an image, the idea easily follows: first, you look at the whole image and kind of put context on it. Then, you implicitly look around for “hints” and decide what the image is representing. Usually, this problem in ML is solved using CNNs. Yet, we believe that there are two flaws in this approach: first, using constant time inference might not be the best choice when it comes to detecting things, as exploring details could be a necessary part of giving an answer; subsequently, the question is “Provided that we can explore local portions of the image, how can we explore the data efficiently?” In order to do that, we would need someone who is actually able to decide what is good and what is bad for detection/segmentation, and CNN-based approaches can’t do that (or, if they can, they require a lot of “magic trickery” and fine-tuning).

> Which strategy of reinforcement learning are you using?

For this project, I’m using a standard RL algorithm with a custom environment built on top of the OpenAI gym library. I would like to extend it with Inverse Reinforcement Learning or Behavioural Cloning in the future, but it’s going to take a while, as we are still trying to come up with the correct formulation of our problem that, as I said, is maybe the hardest and longest part of the process.

> Are you working with Q learning or policy gradient?

At this stage of the project I’m relying on Proximal Policy Optimization, which belongs to the “policy gradient” algorithms, but I don’t exclude that it will be changed in the future as we might find that Q-learning could be better suited than this one.

We hope you enjoyed your stay at CVC and we wish you all the luck!!