An Approximation Algorithm for Optimal Subarchitecture Extraction

Abstract

We consider the problem of finding the set of architectural parameters for a chosen deep neural network which is optimal under three metrics: parameter size, inference speed, and error rate. In this paper we state the problem formally, and present an approximation algorithm that, for a large subset of instances behaves like an FPTAS with an approximation error of ≤ |1- ε|, and that runs in O(|| + |W*T|(1 + |||B|||/(ε\, s3/2))) steps, where ε and s are input parameters; |B| is the batch size; |W*T| denotes the cardinality of the largest weight set assignment; and || and || are the cardinalities of the candidate architecture and hyperparameter spaces, respectively.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…