Online Convex Optimization with Sublinear Noisy Probes

Abstract

We study Online Convex Optimization (OCO) over a convex set K⊂eq Rd, where in each round t the learner selects xt∈ K and then observes a convex loss ft:K[0,1], with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear best-expert queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of k T pairwise probes; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a sublinear and noisy probe budget can provably improve worst-case regret in the full feedback OCO regime. With k δ-noisy pairwise probes, we obtain: RegT O(\dT T,\; dT Tk|1-2δ|\) , which is tight (up to logarithmic factors in T) across T, k and δ. Specifically regarding the noise parameter δ∈ [0,1], the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., δ is close to 12. When applying the same techniques to a finite K for the prediction with d experts setting, the resulting rates are instead completely tight in all parameters, including d. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…