Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale
Abstract
We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state space which can be exponential in the number of agents. This work proposes the SUBSAMPLE-Q algorithm where the global agent subsamples k≤ n local agents to compute a policy in time that is polynomial in k. We show that this learned policy converges to the optimal policy in the order of O(1/k+εk,m) as the number of sub-sampled agents k increases, where εk,m is the Bellman noise. Finally, we validate the theory through numerical simulations in a demand-response setting and a queueing setting.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.