Computing MEMs and Relatives on Repetitive Text Collections

Abstract

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1 .. m] on a large repetitive text collection T[1 .. n], which is represented as a (hopefully much smaller) run-length context-free grammar of size grl. We show that the problem can be solved in time O(m2 ε n), for any constant ε > 0, on a data structure of size O(grl). Further, on a locally consistent grammar of size O(δnδ), the time decreases to O(m m( m + ε n)). The value δ is a function of the substring complexity of T and (δnδ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ. We extend our results to several related problems, such as finding k-MEMs, MUMs, rare MEMs, and applications.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…