Optimal Dynamic Regret in LQR Control
Abstract
We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of O(max\n1/3 TV(M1:n)2/3, 1\), where TV(M1:n) is the total variation of any oracle sequence of Disturbance Action policies parameterized by M1,...,Mn -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of O(n (TV(M1:n)+1) ) for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal O(n1/3) dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.