On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

Pedro Vilanova

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

Abstract

In this study, we demonstrate that the norm test and inner product/orthogonality test presented in Bol18 are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if ε2=θ2+2 with specific choices of θ and . Here, ε controls the relative statistical error of the norm of the gradient while θ and control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if θ and are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if ε2=θ2+2. Finally, we present two stochastic optimization problems to illustrate our results.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…