Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback

Abstract

We study the Logistic Contextual Slate Bandit problem, where, at each round, an agent selects a slate of N items from an exponentially large set (of size 2(N)) of candidate slates provided by the environment. A single binary reward, determined by a logistic model, is observed for the chosen slate. Our objective is to develop algorithms that maximize cumulative reward over T rounds while maintaining low per-round computational costs. We propose two algorithms, Slate-GLM-OFU and Slate-GLM-TS, that accomplish this goal. These algorithms achieve NO(1) per-round time complexity via local planning (independent slot selections), and low regret through global learning (joint parameter estimation). We provide theoretical and empirical evidence supporting these claims. Under a well-studied diversity assumption, we prove that Slate-GLM-OFU incurs only O(T) regret. Extensive experiments across a wide range of synthetic settings demonstrate that our algorithms consistently outperform state-of-the-art baselines, achieving both the lowest regret and the fastest runtime. Furthermore, we apply our algorithm to select in-context examples in prompts of Language Models for solving binary classification tasks such as sentiment analysis. Our approach achieves competitive test accuracy, making it a viable alternative in practical scenarios.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…