Wheeler maps

Abstract

Motivated by challenges in pangenomic read alignment, we propose a generalization of Wheeler graphs that we call Wheeler maps. A Wheeler map stores a text T[1..n] and an assignment of tags to the characters of T such that we can preprocess a pattern P[1..m] and then, given i and j, quickly return all the distinct tags labeling the first characters of the occurrences of P[i..j] in T. For the applications that most interest us, characters with long common contexts are likely to have the same tag, so we consider the number t of runs in the list of tags sorted by their characters' positions in the Burrows-Wheeler Transform (BWT) of T. We show how, given a straight-line program with g rules for T, we can build an O(g + r + t)-space Wheeler map, where r is the number of runs in the BWT of T, with which we can preprocess a pattern P[1..m] in O(m n) time and then return the k distinct tags for P[i..j] in optimal O(k) time for any given i and j. We show various further results related to prioritizing the most frequent tags.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…