Efficient Multinomial Logistic Bandit via Frequent Directions

Lijun Zhang

Efficient Multinomial Logistic Bandit via Frequent Directions

Abstract

This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over K+1 outcomes follows a multinomial logistic model of d-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of O(KdT), but still requires O(K3d3) time and O(K2d2) space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and Kd × K spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and K × K eigenvalue computations, respectively. This yields dominant per-round time complexity O(Kd(m+K)2) and space complexity O(Kd(m+K)), where m d is the sketch size. We further prove a regret bound of O(ΔT(KdΔT+m)T), where the sketching error factor ΔT is controlled by the m-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…