Active Linear Regression for p Norms and Beyond
Abstract
We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector b∈ Rn and output a near minimizer to x∈ Rd \|Ax-b\|, for a design matrix A∈ Rn × d and loss \|·\|. For p norm regression for any 0<p<∞, we give an algorithm based on Lewis weight sampling outputting a (1+ε)-approximate solution using just O(d/ε2) queries to b for p∈(0,1), O(d/ε) queries for 1<p<2, and O(dp/2/εp) queries for 2<p<∞. For 0<p<2, our bounds are optimal up to log factors, settling the query complexity for this range. For 2<p<∞, our dependence on d is optimal, while our dependence on ε is off by at most ε, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the 1 norm, but required d2/ε2 samples for p regression with 1<p<2, and gave no bounds for 2<p<∞ or 0<p<1. We also give the first total sensitivity bound of O(d\1,p/2\2n) for loss functions of degree p polynomial growth, improving a result of [TMF20]. By combining this with our techniques for p regression, we obtain an active regression algorithm making O(d1+\1,p/2\/poly(ε)) queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to O(d4-22/poly(ε)) samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed p-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every p norm.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.