CaCuTe: Casual Cubic-Model Technique for Faster Optimization

Abstract

We establish a local O(k-2) rate for the gradient update xk+1=xk-∇ f(xk)/H\|∇ f(xk)\| under a 2H-Hessian--Lipschitz assumption. Regime detection relies on Hessian--vector products, avoiding Hessian formation or factorization. Incorporating this certificate into cubic-regularized Newton (CRN) and an accelerated variant enables per-iterate switching between the cubic and gradient steps while preserving CRN's global guarantees. The technique achieves the lowest wall-clock time among compared baselines in our experiments. In the first-order setting, the technique yields a monotone, adaptive, parameter-free method that inherits the local O(k-2) rate. Despite backtracking, the method shows superior wall-clock performance. Additionally, we cover smoothness relaxations beyond classical gradient--Lipschitzness, enabling tighter bounds, including global O(k-2) rates. Finally, we generalize the technique to the stochastic setting.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…