Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration

Abstract

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurently explore an environment. The theoretical results %that we established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of (1N), highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to russo2019worst and agrawal2021improved. We reduce the space complexity by a factor of K while incurring only a K increase in the worst-case regret bound, compared to agrawal2021improved,russo2019worst. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…