Place: Large Lecture Room
TALK 1: Overcoming catastrophic forgetting with hard attention to the task
BIO: Joan Serrà is a research scientist with Telefónica R&D in Barcelona. He works on machine learning and artificial intelligence, typically dealing with sequential and/or sparse data. He did his MSc (2007) and PhD (2011) in computer science at the Music Technology Group of Universitat Pompeu Fabra, Barcelona. During that time, he was also an adjunct professor with the Dept. of Information and Communication Technologies of the same university (2006-2011). He did a postdoc in artificial intelligence at IIIA-CSIC, the Artificial Intelligence Research Institute of the Spanish National Research Council in Bellaterra, Barcelona (2011-2015). He has had research stays at the Max Planck Institute for the Physics of Complex Systems in Dresden, Germany (2010), the Max Planck Institute for Computer Science in Saarbrücken, Germany (2011), and visited Goldsmiths, University of London, United Kingdom (2012). Joan has been involved in more than 10 research projects, funded by Spanish and European institutions, and co-authored over 90 publications, many of them highly-cited and in top-tier journals and conferences, in diverse scientific areas. He also regularly acts as peer reviewer for some of those and other publications.
Abstract: Catastrophic forgetting occurs when a neural network loses the information learned with the first task, after training on a second task. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks’ information without affecting the current task’s learning. A hard attention mask is learned concurrently to every task through stochastic gradient descent, and previous masks are exploited to constrain such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.
TALK 2: Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting
by Xialei Liu/ Marc Masana
Abstract: In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to other state-of-the-art in lifelong learning without forgetting.