Bandit Online Optimization Over the Permutahedron

Eiji Takimoto

Bandit Online Optimization Over the Permutahedron

Abstract

The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over \1,…, n\. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed loss of Σi=1n π(i) st(i). A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(nT n) for a time horizon of T. Unfortunately, CombBand requires at each step an n-by-n matrix permanent approximation to within improved accuracy as T grows, resulting in a total running time that is super linear in T, making it impractical for large time horizons. We provide an algorithm of regret O(n3/2T) with total time complexity O(n3T). The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…