Model Selection for Generic Reinforcement Learning

Abstract

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel P* belongs to a family of models P* with finite metric entropy. In the model selection framework, instead of P*, we are given M nested families of transition kernels 1 ⊂ 2 ⊂ … ⊂ M. We propose and analyze a novel algorithm, namely Adaptive Reinforcement Learning (General) (ARL-GEN) that adapts to the smallest such family where the true transition kernel P* lies. ARL-GEN uses the Upper Confidence Reinforcement Learning (UCRL) algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that ARL-GEN obtains a regret of O(dE*H2+dE* M* H2 T), with high probability, where H is the horizon length, T is the total number of steps, dE* is the Eluder dimension and M* is the metric entropy corresponding to P*. Note that this regret scaling matches that of an oracle that knows P* in advance. We show that the cost of model selection for ARL-GEN is an additive term in the regret having a weak dependence on T. Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel P* has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…