Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning
Abstract
We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is vt=μ( ct)+ξt, with an unknown utility map μ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map u p(u) induced by the scalar index u=μ( c) and the noise tail. Under the β-Hölder smoothness of the tail function for β≥ 2 and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself (β-1)-smooth. We exploit such structure through ORBIT, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model μ( c)= cθ, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret O(T2β-14β-3+dT). For fixed d, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric Hölder utility.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.