p-Regression in the Arbitrary Partition Model of Communication
Abstract
We consider the randomized communication complexity of the distributed p-regression problem in the coordinator model, for p∈ (0,2]. In this problem, there is a coordinator and s servers. The i-th server receives Ai∈\-M, -M+1, …, M\n× d and bi∈\-M, -M+1, …, M\n and the coordinator would like to find a (1+ε)-approximate solution to x∈Rn \|(Σi Ai)x - (Σi bi)\|p. Here M ≤ poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For p = 2, i.e., least squares regression, we give the first optimal bound of (sd2 + sd/ε) bits. For p ∈ (1,2),we obtain an O(sd2/ε + sd/poly(ε)) upper bound. Notably, for d sufficiently large, our leading order term only depends linearly on 1/ε rather than quadratically. We also show communication lower bounds of (sd2 + sd/ε2) for p∈ (0,1] and (sd2 + sd/ε) for p∈ (1,2]. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.