Statistical Consistency and Generalization of Contrastive Representation Learning

Yiming Ying

Statistical Consistency and Generalization of Contrastive Representation Learning

Abstract

Contrastive representation learning (CRL) underpins many modern foundation models. Despite recent theoretical progress, existing analyses suffer from several key limitations: (i) the statistical consistency of CRL remains poorly understood; (ii) available generalization bounds deteriorate as the number of negative samples increases, contradicting the empirical benefits of large negative sets; and (iii) the retrieval performance of CRL has received limited theoretical attention. In this paper, we develop a unified statistical learning theory for CRL. For downstream tasks, we evaluate retrieval quality using an AUC-type population criterion and show that the contrastive loss is statistically consistent with optimal ranking. We further establish a calibration-style inequality that quantitatively relates excess contrastive risk to excess retrieval suboptimality. For upstream training, we study both supervised and self-supervised contrastive objectives and derive generalization bounds of order O(1/m + 1/n) and O(1/m + 1/n), respectively, where m denotes the number of negative samples and n the number of anchor points. These bounds not only explain the empirical advantages of large negative sets but also reveal an explicit trade-off between m and n. Extensive experiments on large-scale vision--language models corroborate our theoretical predictions.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…