Improved space-time tradeoffs for approximate full-text indexing with one edit error

Abstract

In this paper we are interested in indexing texts for substring matching queries with one edit error. That is, given a text T of n characters over an alphabet of size σ, we are asked to build a data structure that answers the following query: find all the occ substrings of the text that are at edit distance at most 1 from a given string q of length m. In this paper we show two new results for this problem. The first result, suitable for an unbounded alphabet, uses O(nε n) (where ε is any constant such that 0<ε<1) words of space and answers to queries in time O(m+occ). This improves simultaneously in space and time over the result of Cole et al. The second result, suitable only for a constant alphabet, relies on compressed text indices and comes in two variants: the first variant uses O(nε n) bits of space and answers to queries in time O(m+occ), while the second variant uses O(n n) bits of space and answers to queries in time O((m+occ) n). This second result improves on the previously best results for constant alphabets achieved in Lam et al. (Algorithmica 2008) and Chan et al. (Algorithmica 2010).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…