An LP-Based Approach for Bilinear Saddle Point Problem with Instance-dependent Guarantee and Noisy Feedback

Abstract

In this work, we study the sample complexity of obtaining a Nash equilibrium (NE) estimate in two-player zero-sum matrix games with noisy feedback. Specifically, we propose a novel algorithm that repeatedly solves linear programs (LPs) to obtain an NE estimate with bias at most with a sample complexity of O(m1 m2\δ2,σ02,σ3\ m1 m2) for general m1 × m2 game matrices, where σ, σ0, δ are some problem-dependent constants. To our knowledge, this is the first instance-dependent sample complexity bound for finding an NE estimate with bias in general-dimension matrix games with noisy feedback and potentially non-unique equilibria. Our algorithm builds on recent advances in online resource allocation and operates in two stages: (1) identifying the support set of an NE, and (2) computing the unique NE restricted to this support. Both stages rely on a careful analysis of LP solutions derived from noisy samples.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…