Regret Minimization in Scalar, Static, Non-linear Optimization Problems

Abstract

We study the problem of determining an effective exploration strategy in static and non-linear optimization problems, which depend on an unknown scalar parameter to be learned from online collected noisy data. An optimal trade-off between exploration and exploitation is crucial for effective optimization under uncertainties, and to achieve this we consider a cumulative regret minimization approach over a finite horizon, with each time instant in the horizon characterized by a stochastic exploration signal, whose variance is to be designed. We aim to extend the well-established concepts of regret minimization from linear to non-linear systems, with a focus on the subsequent conceptual differences and challenges. Thus, under an idealized assumption on an appropriately defined information function associated with the excitation, we are able to show that an optimal exploration strategy is either to use no exploration at all (called lazy exploration) or adding an exploration excitation only at the first time instant of the horizon (called immediate exploration). A quadratic numerical example is presented to demonstrate the effectiveness of the proposed strategy.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…