UniRank: Unimodal Bandit Algorithm for Online Ranking

Abstract

We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by ADMA, creating an algorithm with an expected regret matching O(L(L)(T)) with 2L players, T iterations and a minimum reward gap . We reduce this bound in two steps. First, as in GRAB and UniRank we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in O(L1(T)). Secondly, we show that by moving the focus towards the main question `Is user i better than user j?' this regret becomes O(L2(T)), where > derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…