Exploring TD error as a heuristic for σ selection in Q(σ, λ)
Abstract
In the landscape of TD algorithms, the Q(σ, λ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. σ ∈ [0, 1] indicates the extent to which sampling is used. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.
0
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.