Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

Guannan Qu

Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

Abstract

We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state space which can be exponential in the number of agents. This work proposes the SUBSAMPLE-Q algorithm where the global agent subsamples k≤ n local agents to compute a policy in time that is polynomial in k. We show that this learned policy converges to the optimal policy in the order of O(1/k+εk,m) as the number of sub-sampled agents k increases, where εk,m is the Bellman noise. Finally, we validate the theory through numerical simulations in a demand-response setting and a queueing setting.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…