Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

Donghwan Lee

Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

Abstract

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, Q-learning has proven to be a powerful algorithm in model-free settings. However, the extension of Q-learning to a model-based framework remains relatively unexplored. In this paper, we investigate the sample complexity of Q-learning when integrated with a model-based approach. The proposed algorihtms learns both the model and Q-value in an online manner. We demonstrate a near-optimal sample complexity result within a broad range of step sizes.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…