Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete Action Spaces

Abstract

Reinforcement Learning (RL) is increasingly applied to large-scale decision-making problems like logistics, scheduling, and recommender systems, but existing algorithms struggle with the curse of dimensionality in such large discrete action spaces. We propose Distance-Guided Reinforcement Learning (DGRL), combining Sampled Dynamic Neighborhoods and Distance-Based Updates to enable efficient RL in problems with up to 1020 actions. Unlike prior methods, DGRL performs stochastic volumetric exploration and transforms policy optimization into a stable regression task, decoupling gradient variance from action space cardinality. On structured tasks, DGRL provably guarantees local value improvement. DGRL naturally generalizes to hybrid continuous-discrete action spaces. We demonstrate performance improvements of up to 66% against state-of-the-art benchmarks across regularly and irregularly structured environments, while simultaneously improving convergence speed and computational complexity.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…