Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning

Abstract

In continual learning (CL), model growth enhances adaptability to new data. However, when model growth is applied improperly, especially in task-agnostic CL, where the entire grown model is used for inference, it can lead to severe degradation of learned knowledge, a problem we term growth-induced forgetting. Most existing methods that adopt model growth to improve adaptability often overlook the forgetting issue, resulting in compromised knowledge retention, making them unsuitable for task-agnostic settings. To promote both adaptability and knowledge retention with model growth, we identify the key: gradient and parameter sparsity. Introducing SparseGrow, which increases gradient sparsity through layer expansion and gradient gating to enable focused updates on parameters while preserving critical parameters, thus inhibiting forgetting. Moreover, it promotes parameter sparsity with sparse initialization and training, aiming at better control of model plasticity, improving adaptability over new data. Extensive experiments across diverse datasets, task-agnostic settings, and a large number of tasks demonstrate the necessity of controlled layer expansion and validate the effectiveness of SparseGrow in achieving high adaptability while minimizing forgetting in continual learning. By enabling model growth with sparsified gradients and parameters, SparseGrow paves the way for building scalable lifelong learning systems capable of continual adaptation with better knowledge retention.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…