Adaptive Execution: Exploration and Learning of Price Impact

Benjamin Van Roy

Adaptive Execution: Exploration and Learning of Price Impact

Abstract

We consider a model in which a trader aims to maximize expected risk-adjusted profit while trading a single security. In our model, each price change is a linear combination of observed factors, impact resulting from the trader's current and prior activity, and unpredictable random effects. The trader must learn coefficients of a price impact model while trading. We propose a new method for simultaneous execution and learning - the confidence-triggered regularized adaptive certainty equivalent (CTRACE) policy - and establish a poly-logarithmic finite-time expected regret bound. This bound implies that CTRACE is efficient in the sense that the (ε,δ)-convergence time is bounded by a polynomial function of 1/ε and log(1/δ) with high probability. In addition, we demonstrate via Monte Carlo simulation that CTRACE outperforms the certainty equivalent policy and a recently proposed reinforcement learning algorithm that is designed to explore efficiently in linear-quadratic control problems.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…