Computing MEMs and Relatives on Repetitive Text Collections

Gonzalo Navarro

Computing MEMs and Relatives on Repetitive Text Collections

Abstract

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1 .. m] on a large repetitive text collection T[1 .. n], which is represented as a (hopefully much smaller) run-length context-free grammar of size grl. We show that the problem can be solved in time O(m2 ε n), for any constant ε > 0, on a data structure of size O(grl). Further, on a locally consistent grammar of size O(δnδ), the time decreases to O(m m( m + ε n)). The value δ is a function of the substring complexity of T and (δnδ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ. We extend our results to several related problems, such as finding k-MEMs, MUMs, rare MEMs, and applications.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…