Finite sample rates for logistic regression with small noise or few samples

Abstract

The logistic regression estimator is known to inflate the magnitude of its coefficients if the sample size n is small, the dimension p is (moderately) large or the signal-to-noise ratio 1/σ is large (probabilities of observing a label are close to 0 or 1). With this in mind, we study the logistic regression estimator with p n/ n, assuming Gaussian covariates and labels generated by the Gaussian link function, with a mild optimization constraint on the estimator's length to ensure existence. We provide finite sample guarantees for its direction, which serves as a classifier, and its Euclidean norm, which is an estimator for the signal-to-noise ratio. We distinguish between two regimes. In the low-noise/small-sample regime (σ (p n)/n), we show that the estimator's direction (and consequentially the classification error) achieve the rate (p n)/n - up to the log term as if the problem was noiseless. In this case, the norm of the estimator is at least of order n/(p n). If instead (p n)/n σ 1, the estimator's direction achieves the rate σ p n/n, whereas its norm converges to the true norm at the rate p n/(nσ3). As a corollary, the data are not linearly separable with high probability in this regime. In either regime, logistic regression provides a competitive classifier.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…