FIT to Forget: Robust Continual Unlearning for Large Language Models
Abstract
While large language models (LLMs) exhibit remarkable capabilities, they increasingly face demands to unlearn memorized privacy-sensitive, copyrighted, or harmful content. Existing unlearning methods primarily focus on single-shot scenarios, whereas real-world deletion requests arrive continually. Na\"ively applying these methods to sequential requests leads to severe utility degradation and catastrophic forgetting. To address this, we propose , a robust continual unlearning framework to process high-volume sequential deletion streams while resisting both catastrophic forgetting and post-unlearning recovery. stabilizes sequential updates through three synergistic mechanisms: redundancy Filtering, Importance-aware adaptive algorithm selection, and Targeted layer attribution. Furthermore, to facilitate rigorous evaluation, we introduce PCH, a unified benchmark encompassing Personal, Copyrighted, and Harmful content, alongside two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), to systematically quantify forgetting-utility trade-offs. Extensive experiments across five LLMs (up to 14B parameters) demonstrate that consistently achieves state-of-the-art unlearning efficacy and utility preservation. Notably, even after hundreds of sequential requests, preserves strong downstream (, GSM8K, MMLU) performance and exhibits superior resilience against relearning and quantization recovery attacks.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.