Longest Common Extensions with Wildcards: Trade-off and Applications

Abstract

We study the Longest Common Extension (LCE) problem in a string containing wildcards. Wildcards (also called "don't cares" or "holes") are special characters that match any other character in the alphabet, similar to the character "?" in Unix commands or "." in regular expression engines. We consider the problem parametrized by G, the number of maximal contiguous groups of wildcards in the input string. Our main contribution is a simple data structure for this problem that can be built in O(n (G/t) n) time, occupies O(nG/t) space, and answers queries in O(t) time, for any t ∈ [1, G]. Up to the O( n) factor, this interpolates smoothly between the data structure of Crochemore et al. [JDA 2015], which has O(nG) preprocessing time and space, and O(1) query time, and a simple solution based on the "kangaroo jumping" technique [Landau and Vishkin, STOC 1986], which has O(n) preprocessing time and space, and O(G) query time. By establishing a connection between this problem and Boolean matrix multiplication, we show that our solution is optimal, up to subpolynomial factors, among combinatorial data structures when G = (nε) under a widely believed hypothesis. In addition, we develop a simple deterministic combinatorial algorithm for sparse Boolean matrix multiplication. We further establish a conditional lower bound for non-combinatorial data structures, stating that O(nG/t4) preprocessing time (resp. space) is optimal, up to subpolynomial factors, for any data structure with query time t for a wide range of t and G, assuming the well-established 3SUM (resp. Set-Disjointness) conjecture. Finally, we show that our data structure can be used to obtain efficient algorithms for approximate pattern matching and structural analysis of strings with wildcards.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…