Computational Complexity of Sub-Linear Convergent Algorithms
Abstract
Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One of these approaches is reducing the gradient variance through adaptive sampling to solve large-scale optimization's empirical risk minimization (ERM) problems. In this paper, we will explore how starting with a small sample and then geometrically increasing it and using the solution of the previous sample ERM to compute the new ERM. This will solve ERM problems with first-order optimization algorithms of sublinear convergence but with lower computational complexity. This paper starts with theoretical proof of the approach, followed by two experiments comparing the gradient descent with the adaptive sampling of the gradient descent and ADAM with adaptive sampling ADAM on different datasets.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.