Advancing Transfer Learning and Control of Generative Image Models

Advancing Transfer Learning and Control of Generative Image Models


Hector Laria Mantecon will defended his dissertation on July 8, 2025, on CVC's Conference Room.

What is the thesis about?

Deep generative models have revolutionized image synthesis, enabling unprecedented capabilities in content creation across diverse domains. Despite significant advances, these models face several fundamental challenges that limit their practical applications, including efficient knowledge transfer, consistent 3D-aware generation, robustness to customization, and precise attribute control. This thesis investigates and addresses these challenges, aiming to advance state-of-the-art generative models for image synthesis.

First, we explore efficient knowledge transfer from unconditional to conditional GANs, an overlooked direction given the availability of high-quality pretrained unconditional models. We introduce hyper-modulation, a technique leveraging hypernetworks to efficiently generate weight modulation parameters on-the-fly, enabling class-specific outputs while preserving generation quality and exploiting inter-class similarities. Our approach demonstrates superior performance across multiple datasets, significantly outperforming existing methods.

Second, we tackle the integration of 3D awareness with text-guided editing capabilities in generative models. We present NeRF-Diffusion, a framework that combines a Neural Radiance Field for shape priors with a diffusion model for content generation, bridged by a shared consistency token. This approach maintains identity coherence across viewpoints while enabling text-based editing, effectively balancing geometric consistency with creative flexibility.

Third, we investigate forgetting in diffusion model customization, where even minimal adaptations to the original model can cause widespread knowledge degradation. Through comprehensive analysis, we characterize both semantic and appearance drift, and introduce a functional regularization approach that preserves original capabilities while accommodating new concepts. Our method significantly reduces knowledge degradation without compromising generation quality or personalization effectiveness.

Finally, we address the challenge of precise color control in diffusion models. By discovering and leveraging semantic attribute binding within IP-Adapter frameworks, we develop ColorWave, a training-free approach that enables exact RGB-level color specification during inference. This method eliminates the need for computationally expensive optimization processes while maintaining generation quality and respecting other aspects of input prompts. Our contributions advance the state-of-the-art in generative image synthesis by enhancing model transferability, consistency, robustness, and controllability, ultimately making these powerful technologies more accessible and reliable for practical applications.

Keywords

Generative adversarial networks, diffusion models, neural radiance fields, transfer learning, model customization, 3D-consistent generation, catastrophic forgetting, color control, text-to-image models