Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time

Abstract

Given a threshold L and a set R = \R1, …, Rm\ of m haplotype sequences, each having length n, the minimum segmentation problem for founder reconstruction is to partition the sequences into disjoint segments R[i1+1,i2], R[i2+1, i3], …, R[ir-1+1, ir], where 0 = i1 < ·s < ir = n and R[ij-1+1, ij] is the set \R1[ij-1+1, ij], …, Rm[ij-1+1, ij]\, such that the length of each segment, ij - ij-1, is at least L and K = j\ |R[ij-1+1, ij]| \ is minimized. The distinct substrings in the segments R[ij-1+1, ij] represent founder blocks that can be concatenated to form K founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn2). This improvement enables to exploit the algorithm on a pan-genomic setting of haplotypes being complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…