Coupling without Communication and Drafter-Invariant Speculative Decoding
Abstract
Suppose Alice has a distribution P and Bob has a distribution Q. Alice wants to draw a sample a P and Bob a sample b Q such that a = b with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve [a = b] = 1 - DTV(P,Q), where DTV(P,Q) is the total variation distance between P and Q. What if Alice and Bob must solve this same problem without communicating at all? Perhaps surprisingly, with access to public randomness, they can still achieve [a = b] ≥ 1 - DTV(P,Q)1 + DTV(P,Q) ≥ 1-2DTV(P,Q) using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by [Bavarian et al., 2020]. In this work, we revisit the communication-free coupling problem. We provide a simpler proof of the optimality result from [Bavarian et al., 2020]. We show that, while the worst-case success probability of Weighted MinHash cannot be improved, an equally simple protocol based on Gumbel sampling offers a Pareto improvement: for every pair of distributions P, Q, Gumbel sampling achieves an equal or higher value of [a = b] than Weighted MinHash. Importantly, this improvement translates to practice. We demonstrate an application of communication-free coupling to speculative decoding, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols can be used to contruct schemes, which have the desirable property that their output is fixed given a fixed random seed, regardless of what drafter is used for speculation. In experiments on a language generation task, Gumbel sampling outperforms Weighted MinHash. Code is available at https://github.com/majid-daliri/DISD.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.