On the Smallest Size of Internal Collage Systems

Abstract

A Straight-Line Program (SLP) for a string T is a context-free grammar in Chomsky normal form that derives T only, which can be seen as a compressed form of T. Kida et al.\ introduced collage systems [Theor. Comput. Sci., 2003] to generalize SLPs by adding repetition rules and truncation rules. The smallest size c(T) of collage systems for T has gained attention to see how these generalized rules improve the compression ability of SLPs. Navarro et al. [IEEE Trans. Inf. Theory, 2021] showed that c(T) ∈ O(z(T)) and there is a string family with c(T) ∈ (b(T) |T|), where z(T) is the number of phrases in the Lempel-Ziv parsing of T and b(T) is the smallest size of bidirectional schemes for T. They also introduced a subclass of collage systems, called internal collage systems, and proved that its smallest size c(T) for T is at least b(T). While c(T) c(T) is obvious, it is unknown how large c(T) is compared to c(T). In this paper, we prove that c(T) = (c(T)) by showing that any collage system of size m can be transformed into an internal collage system of size O(m) in O(m2) time. Thanks to this result, we can focus on internal collage systems to study the asymptotic behavior of c(T), which helps to suppress excess use of truncation rules. As a direct application, we get b(T) = O(c(T)), which answers an open question posed in [Navarro et al., IEEE Trans. Inf. Theory, 2021]. We also give a MAX-SAT formulation to compute c(T) for a given T.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…