A Surprisingly Simple Method for Distributed Euclidean-Minimum Spanning Tree / Single Linkage Dendrogram Construction from High Dimensional Embeddings via Distance Decomposition

Abstract

We introduce a decomposition method for the distributed calculation of exact Euclidean Minimum Spanning Trees in high dimensions (where sub-quadratic algorithms are not effective), or more generalized geometric-minimum spanning trees of complete graphs, where for each vertex v∈ V in the graph G=(V,E) is represented by a vector in v∈ Rn, and each for any edge, the the weight of the edge in the graph is given by a symmetric binary `distance' function between the representative vectors w(\x,y\) = d(x,y). This is motivated by the task of clustering high dimensional embeddings produced by neural networks, where low-dimensional algorithms are ineffective; such geometric-minimum spanning trees find applications as a subroutine in the construction of single linkage dendrograms, as the two structures can be converted between each other efficiently.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…