Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

Abstract

In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in n σ + O(k n) bits of space and supports fast pattern matching queries and updates, where σ is the size of an alphabet. Assume that α = σ n letters are packed in a single machine word on the standard word RAM model, and let f(k,n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1,n] in O(k n) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in O(mα f(k,n)) worst-case time and in O(mα + f(k,n)) expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. As an application of our packed c-trie, we show that the sparse suffix tree for a string of length n over prefix codes with k sampled positions, such as evenly-spaced and word delimited sparse suffix trees, can be constructed online in O((nα + k) f(k,n)) worst-case time and O(nα + k f(k,n)) expected time with n σ + O(k n) bits of space. When k = O(nα), by using the state-of-the-art dynamic predecessor/successor data structures, we obtain sub-linear time construction algorithms using only O(nα) bits of space in both cases. We also discuss an application of our packed c-tries to online LZD factorization.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…