Prototype Selection Using Topological Data Analysis
Abstract
Prototype selection methods compress a training set, but the existing taxonomy of condensation, edition, hybrid, competence-based, optimization-based, and clustering-based families does not include methods that operate on the multi-scale topological structure of the data. This paper introduces two different persistence-based prototype selector variants, Topological Prototype Selector (TPS) and Boundary-Conscious Topological Prototype Selector (BoundaryTPS). TPS uses two sequential Rips filtrations to retain boundary-relevant and interior-typical points. BoundaryTPS is a single-stage variant whose vertex-weighted filtration concentrates retention near the decision boundary. We evaluate both methods against seven classical baselines on fifteen real datasets and find that the topological methods occupy a different operating point in the prototype-selection design space than existing methods. BoundaryTPS achieves the lowest mean Friedman rank on H1 persistence-diagram preservation and is significantly better than five of the seven baselines (Nemenyi, α= 0.05). TPS ranks third on the same endpoint. Both methods are more stable under fold perturbation than any chained-decision selector tested, and both inherit the source set's class proportions without label-aware machinery. On aggregate G-Mean both methods are competitive but not leading, with rank-1 frequencies of 11.3\% (TPS) and 9.9\% (BoundaryTPS) across fold combinations. Empirically, both methods scale sub-quadratically in sample size.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.