On constant factor approximation for earth mover distance over doubling metrics
Abstract
Given a metric space (X,dX), the earth mover distance between two distributions over X is defined as the minimum cost of a bipartite matching between the two distributions. The doubling dimension of a metric (X, dX) is the smallest value α such that every ball in X can be covered by 2α ball of half the radius. We study efficient algorithms for approximating earth mover distance over metrics with bounded doubling dimension. Given a metric (X, dX), with |X| = n, we can use O(n2) preprocessing time to create a data structure of size O(n1 + ), such that subsequently queried EMDs can be O(αX/)-approximated in O(n) time. We also show a weaker form of sketching scheme, which we call "encoding scheme". Given (X, dX), by using O(n2) preprocessing time, every subsequent distribution μ over X can be encoded into F(μ) in O(n1 + ) time. Given F(μ) and F(), the EMD between μ and can be O(αX/)-approximated in O(n) time.