Minimax Bounds for Distributed Logistic Regression
Abstract
We consider a distributed logistic regression problem where labeled data pairs (Xi,Yi)∈ Rd×\-1,1\ for i=1,…,n are distributed across multiple machines in a network and must be communicated to a centralized estimator using at most k bits per labeled pair. We assume that the data Xi come independently from some distribution PX, and that the distribution of Yi conditioned on Xi follows a logistic model with some parameter θ∈Rd. By using a Fisher information argument, we give minimax lower bounds for estimating θ under different assumptions on the tail of the distribution PX. We consider both 2 and logistic losses, and show that for the logistic loss our sub-Gaussian lower bound is order-optimal and cannot be improved.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.