The Longest Subsequence-Repeated Subsequence Problem
Abstract
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence S of length n, a letter-repeated subsequence is a subsequence of S in the form of x1d1x2d2·s xkdk with xi a subsequence of S, xj≠ xj+1 and di≥ 2 for all i in [k] and j in [k-1]. We first present an O(n6) time algorithm to compute the longest cubic subsequences of all the O(n2) substrings of S, improving the trivial O(n7) bound. Then, an O(n6) time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of S is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when is unbounded, each letter appears in S at most d times and all the letters in must appear in the solution. We show that the problem is NP-hard for d=4, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in S at most d=3 times, then the problem is solvable in O(n5) time.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.