Minimum mean-squared error estimation with bandit feedback
Abstract
We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian K-vector of unknown covariance by observing only m < K of its entries in each round. We propose two MSE estimators, and analyze their concentration properties. The first estimator is non-adaptive, as it is tied to a predetermined m-subset and lacks the flexibility to transition to alternative subsets. The second estimator, which is derived using a regression framework, is adaptive and exhibits better concentration bounds in comparison to the first estimator. We frame the MSE estimation problem with bandit feedback, where the objective is to find the MSE-optimal subset with high confidence. We propose a variant of the successive elimination algorithm to solve this problem. We also derive a minimax lower bound to understand the fundamental limit on the sample complexity of this problem.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.