Geometrically Averaged Hard Target Updates for Linear Q-Learning

Abstract

Periodic hard target updates are among the most common stabilization devices in modern deep Q-learning. Recent studies suggest that target updates can improve stability in Q-learning with function approximation, including linear function approximation. We introduce and analyze the so-called λ-target update, obtained by averaging the m-periodic target update maps with λ-geometric weights (1-λ)λm-1, λ∈ [0,1]. The endpoint λ=0 recovers the one-period target update, while the continuous endpoint λ1 recovers projected Q-value iteration. We study this mechanism for Q-learning with linear function approximation, namely linear Q-learning, using a switching-system model and related tools. For clarity, the paper treats a deterministic version; the formulation extends to stochastic reinforcement-learning settings.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…