Faster Algorithms for Shortest Unique or Absent Substrings

Abstract

We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string S of length n. Both problems admit folklore O(n)-time solutions using the suffix tree of S. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length n over alphabet [0,σ) can be stored in O(n σ/ n) space and read in O(n σ/ n) time. We present an O(n σ/ n)-time algorithm for computing a SUS of S. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an O(n σ/ n)-time algorithm for computing a SAS of S.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…