Multi-attacks: Many images + the same adversarial attack many target labels
Abstract
We show that we can easily design a single adversarial perturbation P that changes the class of n images X1,X2,…,Xn from their original, unperturbed classes c1, c2,…,cn to desired (not necessarily all the same) classes c*1,c*2,…,c*n for up to hundreds of images and target classes at once. We call these multi-attacks. Characterizing the maximum n we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around 10O(100), posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.