Rethinking the Rank Threshold for LoRA Fine-Tuning
Abstract
A recent landscape analysis of LoRA fine-tuning in the neural tangent kernel regime establishes a sufficient condition r(r+1)/2 > KN on the LoRA rank r for the absence of spurious local minima under squared-error loss, prescribing r ≥ 12 on canonical few-shot RoBERTa setups. The condition is stated for general output dimension K, so its sharpness in any particular regime, and its practical implication for the cross-entropy loss actually used in fine-tuning, are open. We give three results that together reduce the prescribed rank to r = 1 for binary classification in this regime. First, replacing the symmetric Sard-form count with the non-symmetric LoRA manifold dimension yields a strictly weaker capacity requirement, r(m+n) - r2 > C* · KN with C* ≈ 1.35 under Gaussian-iid features, satisfied at r = 1 on canonical setups. Second, in the cross-entropy setting the Polyak--ojasiewicz inequality removes the rank threshold entirely. Third, a Rademacher-complexity bound predicts rank-one variance optimality precisely when the bias term is saturated, which is the case for binary classification but not for K > 2. Empirically, across four GLUE-style binary tasks, three encoder architectures, and at scale on RoBERTa-large, rank one is competitive with the existing prescription r = 12; on multi-class MNLI the optimal rank shifts above one, also as predicted. The binary-regime guarantees are conditional on standard NTK assumptions; the multi-class extension is left to future work.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.