Hardness of Regular Expression Matching with Extensions
Abstract
The regular expression matching problem asks whether a given regular expression of length m matches a given string of length n. As is well known, the problem can be solved in O(nm) time using Thompson's algorithm. Moreover, recent studies have shown that regular expression matching extended with a practical extension called lookaround can be solved in the same time complexity. In this work, we consider four well-known extensions to regular expressions called backreference, squaring, intersection and complement. We prove a number of novel time complexity lower bounds for regular expression matching with these extensions under the Orthogonal Vectors Conjecture (OVC), k-OVC, k-Clique Hypothesis, and Combinatorial k-Clique Hypothesis. Some highlights of our results include the fact that none of the matching problems with the extensions can be solved in n2- poly(m) time for any constant > 0 (for backreference, even when restricted to one capturing group) under OVC, and that the problem with complement, also known as extended regular expression (ERE) matching, cannot be solved in time n2-tower(o(m)) under OVC, nω-tower(o(m)) under the k-Clique Hypothesis (where ω is the matrix multiplication exponent), and n3-tower(o(m)) under the Combinatorial k-Clique Hypothesis, respectively. In particular, the latter two results show that the O(n3 m)-time ERE matching algorithm introduced by Hopcroft and Ullman in 1979 and recently improved by Bille, Grtz and Jessen to run in O(nω m) time using fast matrix multiplication was already optimal in a sense, and shed light on why the theoretical computer science community has struggled to improve the time complexity of ERE matching with respect to n and m for more than 45 years.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.