Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation
Abstract
A suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size χ of the smallest suffixient set as a repetitiveness measure. First, we study its sensitivity to various string operations. We show that χ cannot increase by more than 2 after appending or prepending a character to the string. As a consequence, we are able to give simple linear-time online algorithms to compute smallest suffixient sets. We also show that, although reversing the string can increase χ by an arbitrary O(n) value, it always holds χ(T)/χ(TR) 2. We also prove lower and upper bounds for the additive or multiplicative increase of χ after applying arbitrary edit operations, or rotating the text. In particular, we show that the additive increase can be as large as Ω(n) for all those operations. Secondly, we place χ in between known repetitiveness measures. In particular, we show χ= O(r) (where r is the number of runs in the Burrows-Wheeler Transform of the string), that there are string families where χ=o(v) (where v is the size of the smallext lexicographic parse of the string), and that χ is uncomparable to almost all reachable measures based on copy-paste mechanisms. In passing, we give precise bounds for χ for some relevant string families, for example χ σ+2 on episturmian words over alphabets of size σ (e.g., χ 4 on Fibonacci strings, for which we precisely characterize the only two smallest suffixient sets).
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.