Active Ranking with Subset-wise Preferences
Abstract
We consider the problem of probably approximately correct (PAC) ranking n items by adaptively eliciting subset-wise preference feedback. At each round, the learner chooses a subset of k items and observes stochastic feedback indicating preference information of the winner (most preferred) item of the chosen subset drawn according to a Plackett-Luce (PL) subset choice model unknown a priori. The objective is to identify an ε-optimal ranking of the n items with probability at least 1 - δ. When the feedback in each subset round is a single Plackett-Luce-sampled item, we show (ε, δ)-PAC algorithms with a sample complexity of O(nε2 nδ ) rounds, which we establish as being order-optimal by exhibiting a matching sample complexity lower bound of (nε2 nδ )---this shows that there is essentially no improvement possible from the pairwise comparisons setting (k = 2). When, however, it is possible to elicit top-m (≤ k) ranking feedback according to the PL model from each adaptively chosen subset of size k, we show that an (ε, δ)-PAC ranking sample complexity of O(nm ε2 nδ ) is achievable with explicit algorithms, which represents an m-wise reduction in sample complexity compared to the pairwise case. This again turns out to be order-wise unimprovable across the class of symmetric ranking algorithms. Our algorithms rely on a novel pivot trick to maintain only n itemwise score estimates, unlike O(n2) pairwise score estimates that has been used in prior work. We report results of numerical experiments that corroborate our findings.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.