High Dimensional Clustering with r-nets
Abstract
Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called r-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating r-nets in high-dimensional spaces with 1 and 2 metrics from O(dn2-(ε)) to O(dn + n2-α), where α = (ε1/3/(1/ε)). These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., (1+ε)-approximate kth-nearest neighbor distance, (4+ε)-approximate Min-Max clustering, (4+ε)-approximate k-center clustering. In addition, we build an algorithm that (1+ε)-approximates greedy permutations in time O((dn + n2-α) · ) where is the spread of the input. This algorithm is used to (2+ε)-approximate k-center with the same time complexity.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.