Considering Layerwise Importance in the Lottery Ticket Hypothesis
Abstract
The Lottery Ticket Hypothesis (LTH) showed that by iteratively training a model, removing connections with the lowest global weight magnitude and rewinding the remaining connections, sparse networks can be extracted. This global comparison removes context information between connections within a layer. Here we study means for recovering some of this layer distributional context and generalise the LTH to consider weight importance values rather than global weight magnitudes. We find that given a repeatable training procedure, applying different importance metrics leads to distinct performant lottery tickets with little overlapping connections. This strongly suggests that lottery tickets are not unique
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.