Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless lp Norm Solution for Fast Adversarial Training
Abstract
Adversarial training is a cornerstone of robust deep learning, but fast methods like the Fast Gradient Sign Method (FGSM) often suffer from Catastrophic Overfitting (CO), where models become robust to single-step attacks but fail against multi-step variants. While existing solutions rely on noise injection, regularization, or gradient clipping, we propose a novel solution that purely controls the lp training norm to mitigate CO. Our study is motivated by the empirical observation that CO is more prevalent under the l∞ norm than the l2 norm. Leveraging this insight, we develop a framework for generalized lp attack as a fixed point problem and craft lp-FGSM attacks to understand the transition mechanics from l2 to l∞. This leads to our core insight: CO emerges when highly concentrated gradients where information localizes in few dimensions interact with aggressive norm constraints. By quantifying gradient concentration through Participation Ratio and entropy measures, we develop an adaptive lp-FGSM that automatically tunes the training norm based on gradient information. Extensive experiments demonstrate that this approach achieves strong robustness without requiring additional regularization or noise injection, providing a novel and theoretically-principled pathway to mitigate the CO problem.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.