Sequential Experimental Design for Transductive Linear Bandits

Abstract

In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors X⊂ Rd, a set of items Z⊂ Rd, a fixed confidence δ, and an unknown vector θ∈ Rd, the goal is to infer argmaxz∈ Z zθ with probability 1-δ by making as few sequentially chosen noisy measurements of the form xθ as possible. When X=Z, this setting generalizes linear bandits, and when X is the standard basis vectors and Z⊂ \0,1\d, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages X a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages Z that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books X a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles Z. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…