Linear Time Subsequence and Supersequence Regex Matching

Abstract

It is well-known that checking whether a given string w matches a given regular expression r can be done in quadratic time O(|w|· |r|) and that this cannot be improved to a truly subquadratic running time of O((|w|· |r|)1-ε) assuming the strong exponential time hypothesis (SETH). We study the related problem that asks whether w has a subsequence that matches r, and we show that surprisingly this task admits an algorithm that runs in linear time, i.e., in O(|w| + |r|). We further show that the same holds if we ask for a supersequence instead of a subsequence. Moreover, we show that the quantitative problems of computing a longest subsequence or shortest supersequence of w that matches r can be solved with the same complexity as the classical longest common subsequence or shortest common supersequence problems, i.e., in O(|w|· |r|), and conditionally not in O((|w|·|r|)1 - ε). By contrast, if instead of subsequences or supersequences we consider other string relations like the infix, prefix, left-extension, or extension relations, then all the corresponding problems (both quantitative and non-quantitative) have the same complexity as classical regex matching, i.e., they can also be solved in O(|w|· |r|), but not in O((|w|·|r|)1 - ε) assuming SETH. We last study the complexity of the universal problem that asks if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression. For these problems, we show polynomial upper bounds (along with matching conditional lower bounds) for the infix and prefix relations, but PSPACE-completeness for the extension, left-extension and supersequence relations, and coNP-completeness for the subsequence relation.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…