State Complexity of Pattern Matching in Regular Languages

Abstract

In a simple pattern matching problem one has a pattern w and a text t, which are words over a finite alphabet . One may ask whether w occurs in t, and if so, where? More generally, we may have a set P of patterns and a set T of texts, where P and T are regular languages. We are interested whether any word of T begins with a word of P, ends with a word of P, has a word of P as a factor, or has a word of P as a subsequence. Thus we are interested in the languages (P*) T, (*P) T, (* P*) T, and (* shu P) T, where shu is the shuffle operation. The state complexity (L) of a regular language L is the number of states in the minimal deterministic finite automaton recognizing L. We derive the following upper bounds on the state complexities of our pattern-matching languages, where (P) m, and (T) n: ((P*) T) mn; ((*P) T) 2m-1n; ((*P*) T) (2m-2+1)n; and ((*shu P) T) (2m-2+1)n. We prove that these bounds are tight, and that to meet them, the alphabet must have at least two letters in the first three cases, and at least m-1 letters in the last case. We also consider the special case where P is a single word w, and obtain the following tight upper bounds: ((w*) Tn) m+n-1; ((*w) Tn) (m-1)n-(m-2); ((*w*) Tn) (m-1)n; and ((*shu w) Tn) (m-1)n. For unary languages, we have a tight upper bound of m+n-2 in all eight of the aforementioned cases.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…