Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation
Abstract
Knowledge editing (KE) offers a lightweight alternative to retraining for updating large language models (LLMs). Meanwhile, fine-tuning remains the default operation for adapting LLMs to new domains and tasks. Despite their widespread adoption, these two post-training interventions have been studied in isolation, leaving open a crucial question: if we fine-tune an edited model, do the edits survive? This question is motivated by practical objectives: removing covert or malicious edits, and preserving beneficial edits. If fine-tuning impairs edits (Fig.1), current KE methods become less efficient, as a newly fine-tuned model requires re-editing; if edits persist, fine-tuned models risk propagating hidden malicious edits, raising serious safety concerns. To this end, we systematically quantify edit decay after fine-tuning across 254 experimental configurations. Our results show that in general, edits decay substantially after subsequent fine-tuning. AlphaEdit exhibits the greatest decay on the zsRE benchmark when applied to GPT-J, where 25.27% of previously successful edits become unsuccessful after fine-tuning. We further find that fine-tuning only the edited layers is sufficient to effectively remove edits, while incurring only modest degradation in downstream performance. Surprisingly, fine-tuning non-edited layers leads to greater edit decay than all-layer fine-tuning. Besides, our activation space analysis reveals that fine-tuning produces a larger and more coherent representational shift, both in magnitude and direction, than KE. Overall, our study underscores the necessity of evaluating KE within the broader LLM application pipeline.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.