Dynamics Over Landscape: The Emergence of Linear Separability via Spectral Alignment in Contrastive Learning
Abstract
Contrastive learning effectively clusters data despite a loss landscape filled with poor solutions, a success that is heavily dependent on the choice of data augmentations. How optimization consistently finds meaningful patterns remains an open question. We show this success stems from training dynamics rather than the loss function alone. Crucially, under a highly specific structural assumption governing the connectivity and variance of the data augmentations, we prove that once a critical spectral alignment threshold is reached, data features inevitably and rapidly separate into distinct clusters. We establish this mechanism for both discrete datasets and the macroscopic continuum limit, modeling latent dynamics as a Wasserstein gradient flow to demonstrate that this separation persists as the number of data points approaches infinity. We hypothesize that natural training dynamics inherently drive the system toward this critical state. We extensively validate this empirically across four diverse domains (synthetic shapes, images, text, and PDEs). In every setting, a sharp increase in this spectral quantity consistently precedes clean data separation, revealing that contrastive learning's success is governed by a dynamically emerging trigger tightly coupled to the underlying augmentation structure.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.