One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization
Abstract
We study the use of linear regression for multiclass classification in the over-parametrized regime where some of the training data is mislabeled. In such scenarios it is necessary to add an explicit regularization term, λ f(w), for some convex function f(·), to avoid overfitting the mislabeled data. In our analysis, we assume that the data is sampled from a Gaussian Mixture Model with equal class sizes, and that a proportion c of the training labels is corrupted for each class. Under these assumptions, we prove that the best classification performance is achieved when f(·) = \|·\|22 and λ ∞. We then proceed to analyze the classification errors for f(·) = \|·\|1 and f(·) = \|·\|∞ in the large λ regime and notice that it is often possible to find sparse and one-bit solutions, respectively, that perform almost as well as the one corresponding to f(·) = \|·\|22.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.