A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees

Daniel M. Roy

A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees

Abstract

We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under which regret decomposes into a delay-independent learning term and a delay-induced drift term, yielding a delay-adaptive reduction that converts any algorithm for online linear optimization into one that handles round-dependent delays. For bandit convex optimization, we significantly improve existing regret bounds, with delay-dependent terms matching state-of-the-art first-order rates. For first-order feedback, we recover state-of-the-art regret bounds via a simpler, unified analysis. Quantitatively, for bandit convex optimization we obtain O(dtot + T34k) regret, improving the delay-dependent term from O(\T dmax,(Tdtot)13\) in previous work to O(dtot). Here, k, T, dmax, and dtot denote the dimension, time horizon, maximum delay, and total delay, respectively. Under strong convexity, we achieve O(\σmax T, dtot\ + (T2 T)13 k23), improving the delay-dependent term from O(dmax T) in previous work to O(\σmax T, dtot\), where σmax denotes the maximum number of outstanding observations and may be considerably smaller than dmax.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or open the topic learn hub

Discussion (0)

Sign in to join the discussion.

Loading comments…