Efficient Multinomial Logistic Bandit via Frequent Directions

Abstract

This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over K+1 outcomes follows a multinomial logistic model of d-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of O(KdT), but still requires O(K3d3) time and O(K2d2) space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and Kd × K spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and K × K eigenvalue computations, respectively. This yields dominant per-round time complexity O(Kd(m+K)2) and space complexity O(Kd(m+K)), where m d is the sketch size. We further prove a regret bound of O(ΔT(KdΔT+m)T), where the sketching error factor ΔT is controlled by the m-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…