Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings

Abstract

Given a string T of length n, a substring u = T[i..j] of T is called a shortest unique substring (SUS) for an interval [s,t] if (a) u occurs exactly once in T, (b) u contains the interval [s,t] (i.e. i ≤ s ≤ t ≤ j), and (c) every substring v of T with |v| < |u| containing [s,t] occurs at least twice in T. Given a query interval [s, t] ⊂ [1, n], the interval SUS problem is to output all the SUSs for the interval [s,t]. In this article, we propose a 4n + o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s = t. Here, we propose a (23 + 1)n + o(n) bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of T.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…