GEM-Style Constraints for PEFT with Dual Gradient Projection in LoRA
Abstract
Full fine-tuning of Large Language Models (LLMs) is computationally costly, motivating Continual Learning (CL) approaches that utilize parameter-efficient adapters. We revisit Gradient Episodic Memory (GEM) within the Low-Rank Adapter (LoRA) subspace and introduce I-GEM: a fixed-budget, GPU-resident dual projected-gradient approximation to GEM's quadratic projection. By constraining non-interference solely within the adapter parameters, I-GEM preserves GEM-like stability with orders-of-magnitude lower mean projection overhead. On a 3-task AG News split with induced domain drift, using GPT-2 (355M) and LoRA (r=8), I-GEM matches GEM's average accuracy (within \!0.04 pts) and outperforms A-GEM by \!1.4 pts. Crucially, it reduces projection time vs.\ GEM by a factor of \!103. These results suggest that applying GEM constraints in the LoRA subspace is a practical pathway for continual learning at the LLM scale.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.