The Dynamic k-Mismatch Problem

Abstract

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length m and all length-m substrings of a given text of length n m. We focus on the k-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold k. We assume n 2m (in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index i, returns the Hamming distance between the pattern and the text substring starting at position i, or reports that it exceeds k. First, we show a data structure with O(1) update and O(k) query time. Then we show that O(k) update and O(1) query time is also possible. These two provide an optimal trade-off for the dynamic k-mismatch problem with k n: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve k1-(1) time for all operations. For k n, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking n1/2-(1) time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved O(n) time per operation in that case, but with O(n3/4) time per operation for large alphabets. We improve and extend this result with an algorithm that, given 1 x k, achieves update time O(nk +nkx) and query time O(x). In particular, for k n, an appropriate choice of x yields O([3]nk) time per operation, which is O(n2/3) when no threshold k is provided.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…