Advancing Vision-based End-to-End Autonomous Driving

CVC has a new PhD on its record!

Yi Xiao successfully defended her dissertation on Computer Science on July 10, 2023, and she is now Doctor of Philosophy by the Universitat Autònoma de Barcelona.

What is the thesis about?

In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, manoeuvre planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labelling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving.

We propose to evaluate three approaches to tackle the challenge of end-to-end autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular depth estimation module, where human labelling is not needed. Then, based on the intuition that the latent space of end-to-end driving models encodes relevant information for driving, we use it as prior knowledge for training an affordance-based driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labelled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning. CIL++ leverages modern best practices, such as a large horizontal field of view and a self-attention mechanism, which are contributing to the agent’s understanding of the driving scene and bringing a better imitation of human drivers. Using training data without any human labelling, our model yields almost expert performance in the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labelled data.

Keywords: deep learning, autonomous driving, end-to-end, imitation learning, multimodality, representation learning.