Stability of In-Context Learning: A Spectral Coverage Perspective
Abstract
In-context learning (ICL) is a pivotal capability for the practical deployment of large-scale language models, yet its reliability can vary substantially with the number of demonstrations provided in the prompt. A central obstacle is that the target notion, distributional stability under demonstration resampling, is expensive to measure directly at scale, making prompt-length selection largely heuristic. We therefore study a computable sufficient condition based on a spectral-coverage proxy: the lower tail of the spectrum of a regularized empirical second-moment matrix formed from demonstration representations. Under sub-Gaussian representation assumptions, we derive a non-asymptotic sample-size requirement (a lower bound on K) that guarantees this proxy event with prescribed failure probability, yielding a conservative prompt-length recommendation produced by an observable two-stage estimator. In large-scale experiments, the resulting estimates consistently upper-bound empirical accuracy knee-points, which we treat only as a practical surrogate for the prompt-length transition rather than a definition of stability. On a smaller held-out subset, direct resampling-based distributional stability measurements further validate the intended stability interpretation. Finally, a validation-only calibration step tightens the conservatism (typically to about 1.03--1.20×) while preserving conservative ordering, providing practical and verifiable guidance for ICL prompt design.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.