Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
Abstract
We study the classic Euclidean Minimum Spanning Tree (MST) problem in the Massively Parallel Computation (MPC) model. Given a set X ⊂ Rd of n points, the goal is to produce a spanning tree for X with weight within a small factor of optimal. Euclidean MST is one of the most fundamental hierarchical geometric clustering algorithms, and with the proliferation of enormous high-dimensional data sets, such as massive transformer-based embeddings, there is now a critical demand for efficient distributed algorithms to cluster such data sets. In low-dimensional space, where d = O(1), Andoni, Nikolov, Onak, and Yaroslavtsev [STOC '14] gave a constant round MPC algorithm that obtains a high accuracy (1+ε)-approximate solution. However, the situation is much more challenging for high-dimensional spaces: the best-known algorithm to obtain a constant approximation requires O( n) rounds. Recently Chen, Jayaram, Levi, and Waingarten [STOC '22] gave a O( n) approximation algorithm in a constant number of rounds based on embeddings into tree metrics. However, to date, no known algorithm achieves both a constant number of rounds and approximation. In this paper, we make strong progress on this front by giving a constant factor approximation in O( n) rounds of the MPC model. In contrast to tree-embedding-based approaches, which necessarily must pay ( n)-distortion, our algorithm is based on a new combination of graph-based distributed MST algorithms and geometric space partitions. Additionally, although the approximate MST we return can have a large depth, we show that it can be modified to obtain a O( n)-round constant factor approximation to the Euclidean Traveling Salesman Problem (TSP) in the MPC model. Previously, only a O( n) round was known for the problem.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.