Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation
Abstract
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size n from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems ( d ∈ \2,3\ ), algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of n and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to ε-additive error in expectation with respect to the sampling; we allow O(1) computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under α-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate W22(P,Q) within ε error in ε-(2,d+1+o(1)1+α) time for 0 < α< 1 Hölder smooth distributions P,Q on (0,1)d; an optimal Θ(ε-2) for α> 1/2 when d=2 and nearly optimal as α 1 when d = 3.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.