Low-redundancy Distillation for Continual Learning

Abstract

Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical application. Drawing inspiration from the brain's contextual gating mechanism, which selectively filters neural information and continuously updates past memories, we propose Low-redundancy Distillation (LoRD), a novel CL method that enhances model performance while maintaining training efficiency. This is achieved by eliminating redundancy in three aspects of CL: student model redundancy, teacher model redundancy, and rehearsal sample redundancy. By compressing the learnable parameters of the student model and pruning the teacher model, LoRD facilitates the retention and optimization of prior knowledge, effectively decoupling task-specific knowledge without manually assigning isolated parameters for each task. Furthermore, we optimize the selection of rehearsal samples and refine rehearsal frequency to improve training efficiency. Through a meticulous design of distillation and rehearsal strategies, LoRD effectively balances training efficiency and model precision. Extensive experimentation across various benchmark datasets and environments demonstrates LoRD's superiority, achieving the highest accuracy with the lowest training FLOPs.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…