A Spectral Phase Diagram for Binary Few-Shot Classification: Intrinsic Dimensionality, Geometric Saturation, and Representational Diagnosis
Abstract
Deciding when to stop collecting labeled examples is a fundamental but undertheorized problem in applied machine learning. The saturation index S(K) = erank(ΣW(K)) / K measures the ratio of the effective rank of the pooled within-class sample covariance to the shot count; we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. The index is computable in O(d3) time from support features alone, requiring no test labels or trained classifier. Evaluated across N = 246 doubling-pair observations from seventeen binary tasks and six datasets, sixteen of seventeen tasks have a positive within-task Spearman correlation between S(K) and marginal accuracy gain (median ρ= 0.811). The pooled Spearman correlation is ρ= 0.548 (p = 1.1 × 10-20, N = 246). A three-phase diagram (exploration, transition, saturation) with mean marginal gains of 3.48\%, 2.40\%, and 0.82\% is supported by all pairwise significance tests (p ≤ 0.008). As a binary stopping rule, the index achieves AUC = 0.752, providing meaningful probabilistic guidance for annotation decisions. Asymptotic effective rank and peak accuracy show no significant monotone relationship across tasks (Spearman rs = 0.380, p = 0.133, N = 17). A small saturation index paired with low accuracy diagnoses representational inadequacy. All results are for binary classification with a fixed linear classifier; extensions to N-way settings and pretrained backbone representations are discussed as future work.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.