Tighter Regret Lower Bound for Gaussian Process Bandits with Squared Exponential Kernel in Hypersphere
Abstract
We study an algorithm-independent, worst-case lower bound for the Gaussian process (GP) bandit problem in the frequentist setting, where the reward function is fixed and has a bounded norm in the known reproducing kernel Hilbert space (RKHS). Specifically, we focus on the squared exponential (SE) kernel, one of the most widely used kernel functions in GP bandits. One of the remaining open questions for this problem is the gap in the dimension-dependent logarithmic factors between upper and lower bounds. This paper partially resolves this open question under a hyperspherical input domain. We show that any algorithm suffers (T ( T)d ( T)-d) cumulative regret, where T and d represent the total number of steps and the dimension of the hyperspherical domain, respectively. Regarding the simple regret, we show that any algorithm requires (ε-2( 1ε)d ( 1ε)-d) time steps to find an ε-optimal point. We also provide the improved O(( T)d+1( T)-d) upper bound on the maximum information gain for the SE kernel. Our results guarantee the optimality of the existing best algorithm up to dimension-independent logarithmic factors under a hyperspherical input domain.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.