Online Subset Selection using α-Core with no Augmented Regret

Abstract

We revisit the classic problem of optimal subset selection in the online learning set-up. Assume that the set [N] consists of N distinct elements. On the tth round, an adversary chooses a monotone reward function ft: 2[N] R+ that assigns a non-negative reward to each subset of [N]. An online policy selects (perhaps randomly) a subset St ⊂eq [N] consisting of k elements before the reward function ft for the tth round is revealed to the learner. As a consequence of its choice, the policy receives a reward of ft(St) on the tth round. Our goal is to design an online sequential subset selection policy to maximize the expected cumulative reward accumulated over a time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new polyhedral characterization of the reward functions called α-Core - a generalization of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called α-augmented regret. In this new metric, the performance of the online policy is compared with an unrestricted offline benchmark that can select all N elements at every round. We show that a large class of reward functions, including submodular, can be efficiently optimized with the SCore policy. We also extend the proposed policy to the optimistic learning set-up where the learner has access to additional untrusted hints regarding the reward functions. Finally, we conclude the paper with a list of open problems.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…