A Systematic Framework for Evaluating Topological Representations in Single-Cell Classification
Abstract
Recent advances in biomedicine generate high-dimensional single-cell data that describe cellular heterogeneity with unprecedented detail, but their geometric complexity and non-linear structure often limit the effectiveness of conventional statistical tools. Topological Data Analysis (TDA) provides a mathematical framework for characterizing the shape of data through persistent homology, which extracts structural features such as connected components and cycles across multiple scales. In this work, we propose a systematic two-level framework for evaluating topological representations in high-dimensional single-cell classification. The first level (\(R1\)) performs statistical screening of topological descriptors based on separability between clinical groups, whereas the second level (\(R2\)) evaluates their predictive utility in supervised classification models. This design makes it possible to compare representations not only in terms of discriminative performance, but also in terms of robustness to analytical choices. We illustrate the framework using bone marrow flow cytometry data from pediatric acute lymphoblastic leukemia, with a particular focus on relapse stratification. The results show that different topological representations vary substantially in both statistical separability and predictive stability, with Betti Curves and Persistence Silhouettes showing more robust behavior than Persistence Images in this cohort. Overall, the study provides a reproducible methodological framework for the systematic comparison of topological descriptors in complex biomedical point clouds.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.