Practical colinear chaining on sequences revisited
Abstract
Colinear chaining is a classical heuristic for sequence alignment and is widely used in modern practical aligners. Jain et al. (J. Comput. Biol. 2022) proposed an O(n 3 n) time algorithm to chain a set of n anchors so that the chaining cost matches the edit distance of the input sequences, when anchors are all the maximal exact matches. Moreover, assuming a uniform and sparse distribution of anchors, they provided a practical solution (ChainX) working in O(n · SOL + n n) average-case time, where SOL is the cost of the output chain. This practical solution is not guaranteed to be optimal: we study the failing cases, introduce the anchor diagonal distance, and find and implement an optimal algorithm working in O(n · OPT + n n) average-case time, where OPT SOL is the optimal chaining cost. We validate the results by Jain et al., show that ChainX can be suboptimal with a realistic long read dataset, and show minimal computational slowdown for our solution.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.