Simple KNN-Based Outlier Detection Achieves Robust Clustering

Abstract

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the robust k-Means problem (i.e., k-Means with outliers), the goal is to remove z outliers and minimize the k-Means cost on the remaining points. Despite the close connection between robust k-Means and outlier detection, both theoretical and empirical understanding of the effectiveness of classic outlier detection heuristics for robust k-Means remains limited. In this paper, we prove that under a practical assumption on the optimal cluster sizes, simply removing points with large K-Nearest-Neighbor distances achieves performance comparable to prior work in terms of approximation guarantees: it yields a constant-factor reduction from robust k-Means to standard k-Means, without introducing additional centers or discarding extra outliers, as is commonly required by existing approaches. Empirically, experiments on real-world datasets show that our method outperforms or matches several more sophisticated algorithms in terms of clustering cost and runtime. These results demonstrate that simple KNN-based heuristics can be surprisingly effective for robust clustering, highlighting new opportunities to bridge techniques from outlier detection and clustering.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…