Parse indexing for discarding short pseudo-MEMs safely

Travis Gagie

Parse indexing for discarding short pseudo-MEMs safely

Abstract

Brown et al.\ (2025) described a pre-processing step, called k-mer based breaking (KeBaB), that speeds up searching for long maximal exact matches (MEMs) between a pattern P and an indexed repetitive text T. KeBaB produces a set of substrings of P called pseudo-MEMs that often have total length much less than |P| but are still guaranteed to contain all the MEMs of length at least a fixed parameter k. Brown et al.\ found that KeBaB can be particularly effective when we discard all but the longest pseudo-MEMs -- but then we risk also discarding the longest MEMs! In this paper we show how we can use parse indexing to generate pseudo-MEMs together with lower bounds on the lengths of the longest MEMs they must contain, allowing us to discard short pseudo-MEMs safely.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…