On Locating Paths in Compressed Tries
Abstract
In this paper, we consider the problem of compressing a trie while supporting the powerful locate queries: to return the pre-order identifiers of all nodes reached by a path labeled with a given query pattern. Our result builds on top of the XBWT tree transform of Ferragina et al. [FOCS 2005] and generalizes the r-index locate machinery of Gagie et al. [SODA 2018, JACM 2020] based on the run-length encoded Burrows-Wheeler transform (BWT). Our first contribution is to propose a suitable generalization of the run-length BWT to tries. We show that this natural generalization enjoys several of the useful properties of its counterpart on strings: in particular, the transform natively supports counting occurrences of a query pattern on the trie's paths and its size r captures the trie's repetitiveness and lower-bounds a natural notion of trie entropy. Our main contribution is a much deeper insight into the combinatorial structure of this object. In detail, we show that a data structure of O(r n) + 2n + o(n) bits, where n is the number of nodes, allows locating the occ occurrences of a pattern of length m in nearly-optimal O(mσ + occ) time, where σ is the alphabet's size. Our solution consists in sampling O(r) nodes that can be used as "anchor points" during the locate process. Once obtained the pre-order identifier of the first pattern occurrence (in co-lexicographic order), we show that a constant number of constant-time jumps between those anchor points lead to the identifier of the next pattern occurrence, thus enabling locating in optimal O(1) time per occurrence.