Approximate pattern matching with k-mismatches in packed text

Abstract

Given strings P of length m and T of length n over an alphabet of size σ, the string matching with k-mismatches problem is to find the positions of all the substrings in T that are at Hamming distance at most k from P. If T can be read only one character at the time the best known bounds are O(nk k) and O(n + nk/w k) in the word-RAM model with word length w. In the RAM models (including AC0 and word-RAM) it is possible to read up to w / σ characters in constant time if the characters of T are encoded using σ bits. The only solution for k-mismatches in packed text works in O((n σ/ n)m (k + n / σ) / w + n) time, for any > 0. We present an algorithm that runs in time O(nw/(mσ) (1 + (k,σ) m / σ)) in the AC0 model if m=O(w / σ) and T is given packed. We also describe a simpler variant that runs in time O(nw/(mσ) (m, w / σ)) in the word-RAM model. The algorithms improve the existing bound for w = (1+εn), for any ε > 0. Based on the introduced technique, we present algorithms for several other approximate matching problems.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…