Complexity of Sequence-to-Graph Alignment with Co-Linear Chaining
Abstract
Sequence alignment is a cornerstone technique in computational biology for assessing similarities and differences among biological sequences. A key variant, sequence-to-graph alignment, plays a crucial role in effectively capturing genetic variations. In this work, we introduce two novel formulations within this framework: the Gap-sensitive Co-Linear Chaining (Gap-CLC) problem and the Co-Linear Chaining with Errors based on Edit Distance (Edit-CLC) problem, and we investigate their computational complexity. We show that solving the Gap-CLC problem in sub-quadratic time is highly unlikely unless the Strong Exponential Time Hypothesis fails -- even when restricted to binary alphabets. Furthermore, we establish that the Edit-CLC problem is NP-hard in the presence of errors within the pan-genome graph. These findings emphasize that incorporating co-linear structures into sequence-to-graph alignment models fails to reduce computational complexity, highlighting that these models remain at least as computationally challenging to solve as those lacking such prior information.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.