Data-Driven Policy Learning for Continuous Treatments
Abstract
This paper studies policy learning for continuous treatments from observational data. Continuous treatments present more significant challenges than discrete ones because population welfare may need nonparametric estimation, and policy space may be infinite-dimensional and may satisfy shape restrictions. We propose to approximate the policy space with a sequence of finite-dimensional spaces and, for any given policy, obtain the empirical welfare by applying the kernel method. We consider two cases: known and unknown propensity scores. In the latter case, we allow for machine learning of the propensity score and modify the empirical welfare to account for the effect of machine learning. The learned policy maximizes the empirical welfare or the modified empirical welfare over the approximating space. In both cases, we modify the penalty algorithm proposed in Mbakop and Tabord-Meehan (2021) to data-automate the tuning parameters (i.e., bandwidth and dimension of the approximating space) and establish an oracle inequality for the welfare regret.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.