A Myhill-Nerode Theorem for Generalized Automata, with Applications to Pattern Matching and Compression
Abstract
The model of generalized automata, introduced by Eilenberg in 1974, allows representing a regular language more concisely than conventional automata by allowing edges to be labeled not only with characters, but also strings. Giammarresi and Montalbano introduced a notion of determinism for generalized automata [STACS 1995]. While generalized deterministic automata retain many properties of conventional deterministic automata, the uniqueness of a minimal generalized deterministic automaton is lost. In the first part of the paper, we show that the lack of uniqueness can be explained by introducing a set W(A) associated with a generalized automaton A . In this way, we derive for the first time a full Myhill-Nerode theorem for generalized automata, which contains the textbook Myhill-Nerode theorem for conventional automata as a degenerate case. In the second part of the paper, we show that the set W(A) leads to applications for pattern matching and data compression. We show that a Wheeler generalized automata can be stored using e σ (1 + o(1)) + O(e) bits so that pattern matching queries can be solved in O(m σ) time, where e is the total length of all edge labels, e is the number of edges, σ is the size of the alphabet and m is the length of the pattern.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.