Efficient Stochastic BFGS methods Inspired by Bayesian Principles

Abstract

Quasi-Newton methods are ubiquitous in deterministic local search due to their efficiency and low computational cost. This class of methods uses the history of gradient evaluations to approximate second-order derivatives. However, only noisy gradient observations are accessible in stochastic optimization; thus, deriving quasi-Newton methods in this setting is challenging. Although most existing quasi-Newton methods for stochastic optimization rely on deterministic equations that are modified to circumvent noise, we propose a new approach inspired by Bayesian inference to assimilate noisy gradient information and derive the stochastic counterparts to standard quasi-Newton methods. We focus on the derivations of stochastic BFGS and L-BFGS, but our methodology can also be employed to derive stochastic analogs of other quasi-Newton methods. The resulting stochastic BFGS (S-BFGS) and stochastic L-BFGS (L-S-BFGS) can effectively learn an inverse Hessian approximation even with small batch sizes. For a problem of dimension d, the iteration cost of S-BFGS is O(d2), and the cost of L-S-BFGS is O(d). Numerical experiments with a dimensionality of up to 30,720 demonstrate the efficiency and robustness of the proposed method.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…