Understanding the Embedding Space in Continual and Federated Learning
Dipam Goswami will defended his PhD thesis on April 22, 2026.
What is the thesis about?
Neural networks are extensively used throughout computer vision for a wide range of tasks, where the networks are typically trained on datasets in a single training session. However, enabling these networks to learn continually is challenging as new data arrive over time and models need to adapt accordingly. A critical aspect of continual learning is forgetting of previously learned knowledge after learning on new data. Another challenging paradigm for training neural networks is federated learning, where the data is distributed between multiple clients instead of a single source leading to class imbalance and data heterogeneity across clients.
This thesis investigates continual learning in an exemplar-free setting, which prohibits storing data from previous tasks. We explore the use of class prototypes for classification and investigate the use of Euclidean distance assuming isotropic feature representations. We demonstrate that the feature distributions of classes are not isotropic in a continual setting owing to the stability-plasticity dilemma. We discuss two major continual training practices: training on a bigger dataset in first task and freezing the network after the first task to continually learn only the classifier, and training the entire network at every new task. In the former setting, we propose to exploit the feature covariances to take into account the anisotropic distributions of the new classes which the model has not been trained on. In the latter setting, we discuss the issue of semantic drift in the embedding space and propose to generate adversarial samples from the new task data which can behave similar to the old task data. These generated samples are like pseudo-exemplars which are then used to track the movement of the class prototypes in the evolving embedding space.
Beyond image classification, we also study continual learning for information retrieval where the goal is to retrieve the most related document from a large corpus given a query document. We discuss the non-compatibility issue which stems from storing the corpus embeddings of old tasks beforehand and using the latest model to encode the queries during retrieval. We propose to move the query embeddings to the old embedding space during retrieval to compensate for the drift in the embedding space, thereby avoiding the need to re-index or compute the embeddings of the entire corpus every time the model is updated.
For federated learning, we explore the use of pre-trained feature extractors which achieve superior performance by only exploiting feature distributions from clients in a training-free setting. We discuss how sharing second-order client statistics increases the communication budget and propose to estimate the second-order statistics at the server using only first-order statistics from clients with a provably unbiased covariance estimator. The proposed training-free approach dramatically reduces the communication cost while achieving a stable and more effective classifier initialization.