Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation
Abstract
Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for adapting large pre-trained models, yet its behavior under continual learning remains poorly understood. We present a geometric theory characterizing catastrophic forgetting in LoRA through the lens of gradient subspace interactions. Our central finding is that forgetting is governed by a simple geometric law: F = α(1 - 2θ) + β, where θ is the minimum principal angle between task gradient subspaces. This formulation reveals an approximate rank-invariance property, at high subspace angles, forgetting becomes largely independent of the adapter rank (coefficient of variation ≈ 0.8\% in controlled synthetic settings; CV ≈ 10-19\% on real benchmarks, suggesting this is regime-dependent rather than absolute). We validate our theory on synthetic tasks (r=0.994 correlation), Split-CIFAR100 with ViT-LoRA, and sequential GLUE with RoBERTa-LoRA. Our analysis reconciles seemingly contradictory findings in the literature: we show that rank affects forgetting only when task subspaces are similar (low angle), while orthogonal methods like O-LoRA provide minimal benefit when natural orthogonality is already high. These insights provide principled guidance for continual learning with parameter-efficient fine-tuning.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.