Learning is Revelation in Disguise: Optimal Regret and Equivalence Results for Dynamic Pricing
Abstract
We study dynamic pricing where a seller repeatedly interacts with a strategic, non-myopic buyer who has a fixed private valuation and discounts future utility. Prior work focused exclusively on posted-price mechanisms, where the seller gives a take-it-or-leave-it offer. For our first result, we show that menu mechanisms consisting of allocation-payment contracts achieve O(Tγ) regret, where Tγ is the buyer's effective discounted time horizon. We also establish a Ω(Tγ) lower bound, demonstrating the bound is tight. Considering the geometric discounting buyer with a constant discount factor, our bound is O(1), while prior bounds using posted-price mechanisms incur an unavoidable Ω( T) factor in regret. Our second contribution is more conceptual in nature. The problem of dynamic pricing sits at the intersection of two paradigms: learning with strategic agents in computer science / machine learning and revelation-principle-based mechanism design in economics, yet their relationship has remained unclear. We establish a fundamental equivalence: indirect learning-based mechanisms and direct revelation mechanisms achieve identical optimal regret. The adaptive, data-driven algorithms of online learning and explicit type elicitation are two languages towards solving the same problem.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.