High-Dimensional Clustering via Nearest-Neighbor Asymmetry
Abstract
High-dimensional clustering often relies on geometric or local-similarity structure, but the dominant separation between groups may not always be location-based. Differences in dispersion can create asymmetric local-neighborhood patterns: points from a more dispersed component may be closer to points in a more concentrated component than to points from their own component. We turn this high-dimensional phenomenon into a clustering principle. The proposed method, NAC (Nearest-neighbor Asymmetry Clustering), constructs a directed k-nearest-neighbor graph and evaluates candidate partitions using two permutation-standardized statistics: a weighted within-edge statistic that captures overall within-cluster enrichment and a contrast statistic that captures asymmetric separation. The resulting objective combines these two standardized signals, allowing the method to adapt to different separation regimes without specifying a mixture model or a low-dimensional representation. We provide a population-level analysis showing how the two statistics target complementary nearest-neighbor patterns. Simulation studies across mean, scale, and combined location-scale differences show that NAC is competitive under location separation and especially effective when nearest-neighbor asymmetry is present; gene-expression applications further illustrate its usefulness in small-sample, high-dimensional clustering.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.