Topological Sequence Analysis of Genomes: Delta Complex approaches
Abstract
Algebraic topology has been widely applied to point cloud data to capture geometric shapes and topological structures. However, its application to genome sequence analysis remains rare. In this work, we propose topological sequence analysis (TSA) techniques by constructing -complexes and classifying spaces, leading to persistent homology, and persistent path homology on genome sequences. We also develop -complex-based persistent Laplacians to facilitate the topological spectral analysis of genome sequences. Finally, we demonstrate the utility of the proposed TSA approaches in phylogenetic analysis using Ebola virus sequences and whole bacterial genomes. The present TSA methods are more efficient than earlier TSA model, k-mer topology, and thus have a potential to be applied to other time-consuming sequential data analyses, such as those in linguistics, literature, music, media, and social contexts.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.