Representing Pattern Matching Algorithms by Polynomial-Size Automata

Abstract

Pattern matching algorithms to find exact occurrences of a pattern S∈m in a text T∈n have been analyzed extensively with respect to asymptotic best, worst, and average case runtime. For more detailed analyses, the number of text character accesses XA,Sn performed by an algorithm A when searching a random text of length n for a fixed pattern S has been considered. Constructing a state space and corresponding transition rules (e.g. in a Markov chain) that reflect the behavior of a pattern matching algorithm is a key step in existing analyses of XA,Sn in both the asymptotic (n∞) and the non-asymptotic regime. The size of this state space is hence a crucial parameter for such analyses. In this paper, we introduce a general methodology to construct corresponding state spaces and demonstrate that it applies to a wide range of algorithms, including Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Backward Oracle Matching (BOM), and Backward (Non-Deterministic) DAWG Matching (B(N)DM). In all cases except BOM, our method leads to state spaces of size O(m3) for pattern length m, a result that has previously only been obtained for BMH. In all other cases, only state spaces with size exponential in m had been reported. Our results immediately imply an algorithm to compute the distribution of XA,Sn for fixed S, fixed n, and A∈\BM,BMH,B(N)DM\ in polynomial time for a very general class of random text models.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…