Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation

Zhixue Zhao

doi:10.1145/3770855.3817879

Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation

Abstract

Knowledge editing (KE) offers a lightweight alternative to retraining for updating large language models (LLMs). Meanwhile, fine-tuning remains the default operation for adapting LLMs to new domains and tasks. Despite their widespread adoption, these two post-training interventions have been studied in isolation, leaving open a crucial question: if we fine-tune an edited model, do the edits survive? This question is motivated by practical objectives: removing covert or malicious edits, and preserving beneficial edits. If fine-tuning impairs edits (Fig.1), current KE methods become less efficient, as a newly fine-tuned model requires re-editing; if edits persist, fine-tuned models risk propagating hidden malicious edits, raising serious safety concerns. To this end, we systematically quantify edit decay after fine-tuning across 254 experimental configurations. Our results show that in general, edits decay substantially after subsequent fine-tuning. AlphaEdit exhibits the greatest decay on the zsRE benchmark when applied to GPT-J, where 25.27% of previously successful edits become unsuccessful after fine-tuning. We further find that fine-tuning only the edited layers is sufficient to effectively remove edits, while incurring only modest degradation in downstream performance. Surprisingly, fine-tuning non-edited layers leads to greater edit decay than all-layer fine-tuning. Besides, our activation space analysis reveals that fine-tuning produces a larger and more coherent representational shift, both in magnitude and direction, than KE. Overall, our study underscores the necessity of evaluating KE within the broader LLM application pipeline.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…