Learning to Allocate Resources with Censored Feedback

Abstract

We study the online resource allocation problem in which at each round, a budget B must be allocated across K arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over T rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound (T1/3), demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget B is known at the beginning of each round, RA-UCB achieves a regret of order O(T), and even O(poly- T) under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…