-Good Action Identification in Fixed-Budget Monte Carlo Tree Search
Abstract
We study the fixed-budget max-min action identification problem in depth-2 max-min trees, an important special case of Monte Carlo Tree Search. A learner sequentially allocates T samples to leaves and then recommends a subtree whose minimum leaf value is largest. Motivated by approximate planning, we focus on -good subtree identification, where any subtree whose min value is within of the optimal maximin value is acceptable. Our main contribution is an -agnostic algorithm: it does not require as input, but achieves instance-dependent error bounds for every meaningful . We show that the misidentification probability decays as (-(T/H2())), where H2() captures both cross-subtree and within-subtree gaps. When each subtree has a single leaf, the problem reduces to standard fixed-budget best-arm identification, and our analysis recovers, up to accelerating factors, known -good guarantees for halving-style methods while giving a new -good guarantee for Successive Rejects. On the lower-bound side, we provide complementary positive and negative results showing that max-min identification has a different hardness structure from standard K-armed bandits. To our knowledge, this is the first provable fixed-budget algorithmic guarantee for max-min action identification.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.