Gradient Descent Converges Linearly for Logistic Regression on Separable Data

Maxim Sviridenko

Gradient Descent Converges Linearly for Logistic Regression on Separable Data

Abstract

We show that running gradient descent with variable learning rate guarantees loss f(x) ≤ 1.1 · f(x*) + ε for the logistic regression objective, where the error ε decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution x*. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…