Rank-based linkage I: triplet comparisons and oriented simplicial complexes

Abstract

Rank-based linkage is a new tool for summarizing a collection S of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on S. Rank-based linkage is applied to the K-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected K-nearest neighbor graph on S. In |S| K2 steps it builds an edge-weighted linkage graph (S, L, σ) where σ(\x, y\) is called the in-sway between objects x and y. Take Lt to be the links whose in-sway is at least t, and partition S into components of the graph (S, Lt), for varying t. Rank-based linkage is a functor from a category of ``out-ordered'' digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart'' the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Orientation sheaves play in a fundamental role and ensure that partially overlapping data sets can be ``glued'' together. Open combinatorial problems are presented in the last section.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…