Beyond O(T) Constraint Violation for Online Convex Optimization with Adversarial Constraints
Abstract
We study Online Convex Optimization with adversarial constraints (COCO). At each round a learner selects an action from a convex decision set and then an adversary reveals a convex cost and a convex constraint function. The goal of the learner is to select a sequence of actions to minimize both regret and the cumulative constraint violation (CCV) over a horizon of length T. The best-known policy for this problem achieves O(T) regret and O(T) CCV. In this paper, we improve this by trading off regret to achieve substantially smaller CCV. This trade-off is especially important in safety-critical applications, where satisfying the safety constraints is non-negotiable. Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves O(dT+ Tβ) regret and O(dT1-β) CCV, where d is the dimension of the decision set and β ∈ [0,1] is a tunable parameter. We begin with a special case, called the Constrained Expert problem, where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose a computationally efficient policy for the Constrained Expert problem, that attains O(T N+Tβ) regret and O(T1-β N) CCV for N number of experts. The original problem is then reduced to the Constrained Expert problem via a covering argument. Finally, with an additional M-smoothness assumption, we propose a computationally efficient first-order policy attaining O(MT+Tβ) regret and O(MT1-β) CCV.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.