Learning to Bid in Discriminatory Auctions with Budget Constraints

Abstract

We study repeated bidding in multi-unit discriminatory (pay-as-bid) auctions for a single bidder with per-round utility equal to value minus α times payment, where α∈[0,1] is a cost-of-capital parameter. The bidder aims to maximize cumulative utility over T rounds subject to a total budget B. The problem is challenging even without budgets: the action space is exponential in M, the maximum demand of the bidder and the valuation vector (context) varies over time. Exploiting a decomposition of utility across units, we develop polynomial-time learning algorithms based on shortest paths in a directed acyclic graph, obtaining sublinear regret under both full-information and bandit feedback. In the bandit setting, the regret is independent of the number of contexts due to complete cross-learning: observing the utility of the chosen action under the realized context reveals the utility for the same action under all counterfactual contexts. With budget constraints, when the average normalized per-round budget ρ=BMT<1, we design a coupled primal-dual algorithm in which the DAG-based procedure uses dual-adjusted edge weights for primal updates, while online gradient descent updates the dual variable, yielding ρ-approximate sublinear regret. Finally, we give implementations whose per-round time and space are independent of the number of contexts, enabling scalability to large or even infinite context spaces.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…