AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Abstract

Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian JG of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner JG* Ft JG induced by any W-space preconditioner Ft is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned W-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for JG* Ft JG to use, and (ii) which Ft on W to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for JG* JG, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware Ft paired with a closed-form factor-space solve at O((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner Ht on W and selecting from the resulting factor-space solution family the element minimizing an Ht-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned W-space direction under the Ht-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…