Fast Pattern Matching with Epsilon Transitions
Abstract
In the String Matching in Labeled Graphs (SMLG) problem, we need to determine whether a pattern string appears on a given labeled graph or a given automaton. Under the Orthogonal Vectors hypothesis, the SMLG problem cannot be solved in subquadratic time [ICALP 2019]. In typical bioinformatics applications, pattern matching algorithms should be both fast and space-efficient, so we need to determine useful classes of graphs on which the SLMG problem can be solved efficiently. In this paper, we improve on a recent result [STACS 2024] that shows how to solve the SMLG problem in linear time on the compressed representation of Wheeler generalized automata, a class of string-labeled automata that extend de Bruijn graphs. More precisely, we show how to remove the assumption that the automata contain no ε -transitions (namely, edges labeled with the empty string), while retaining the same time and space bounds. This is a significant improvement because ε -transitions add considerable expressive power (making it possible to jump to multiple states for free) and capture the complexity of regular expressions (through Thompson's construction for converting a regular expression into an equivalent automaton). We prove that, to enable ε -transitions, we only need to store two additional bitvectors that can be constructed in linear time.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.