Approximate Dynamic Programming based on Projection onto the (min,+) subsemimodule

Abstract

We develop a new Approximate Dynamic Programming (ADP) method for infinite horizon discounted reward Markov Decision Processes (MDP) based on projection onto a subsemimodule. We approximate the value function in terms of a (,+) linear combination of a set of basis functions whose (,+) linear span constitutes a subsemimodule. The projection operator is closely related to the Fenchel transform. Our approximate solution obeys the (,+) Projected Bellman Equation (MPPBE) which is different from the conventional Projected Bellman Equation (PBE). We show that the approximation error is bounded in its L∞-norm. We develop a Min-Plus Approximate Dynamic Programming (MPADP) algorithm to compute the solution to the MPPBE. We also present the proof of convergence of the MPADP algorithm and apply it to two problems, a grid-world problem in the discrete domain and mountain car in the continuous domain.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…