Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It
Abstract
Recent finite-time analyses of nonlinear two-time-scale stochastic approximation show that under contractive assumptions the slow iterate Yk with stepsizes βk=Θ(k-1) and αk=Θ(k-a), a∈(1/2,1), generally satisfies a mean-square rate of order k-a; decoupled k-1 rates require strong local linearity. We identify a sharp regularity-dependent boundary. In a rate-determining normal form where the slow drift contains a locally linear leakage and a nonlinear remainder of order 1+ρ (ρ∈[0,1]), the uncorrected recursion satisfies \[ E\|Yk\|2 C(k-1+k-a(1+ρ)), \] and a matching scalar Gaussian lower bound shows that the slower term is unavoidable without modifying the update. Thus the decoupled k-1 rate is guaranteed for the uncorrected recursion exactly when a(1+ρ) 1. This lower bound concerns only the naive update; it is not an information-theoretic obstruction. We demonstrate this by equipping the normal-form recursion with an auxiliary online bias estimator \[ Mk+1=Mk+γk(R(Xk)-Mk), βkγkαk, \] and subtracting Mk from the slow update. Under the same stability, moment, and remainder assumptions, the corrected recursion achieves E\| Yk\|2=O(k-1) for every ρ∈[0,1], including regimes where the uncorrected update provably suffers the slower rate. Finally, we prove localized transfer theorems that extend the phase-transition mechanism to general nonlinear TTSA in fast-manifold coordinates. The proofs are non-asymptotic and rely on two Abel-transform cancellations: one for the locally linear fast-error leakage, and one for the tracked nonlinear bias.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.