Reinforcement Learning for optimal dividend problem under diffusion model
Abstract
In this paper, we study the optimal dividend problem under the continuous time diffusion model with the bounded dividend rate from the Reinforcement Learning (RL) perspective. Unlike the standard literature, our main focus will be on numerical algorithms that allow part or all of the system parameters to be unspecified so that the optimal control cannot be explicitly determined. Following the RL literature we introduce the entropy-regularized exploratory control problem, which randomizes the control actions and balances the levels of exploitation and exploration, and carry out a theoretical analysis of the associated Policy Improvement (PI) and Policy Evaluation (PE) devices and the corresponding sequence of the approximating optimal strategies. Specifically, our algorithm will be based on two independent neural networks that approximate the value function and its derivative simultaneously. Such an algorithm, to the best of our knowledge, is new in the context of the optimal dividend problems, and can be effective even for the situation when the premium and/or interest rate is state dependent, hence beyond reach of the standard statistical methods. Some numerical experiments are presented to empirically demonstrate the effectiveness of our RL algorithm.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.