Faster run-length compressed suffix arrays

Abstract

We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O ( 0ex2ex r (n / r) + r σ + σ ) bits such that later, given character a and the suffix array interval for P, we can find the suffix-array (SA) interval for a P in O ( ra + n) time, where ra is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for a P in only O ( ra) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…