Longest Common Subsequence in k-length substrings

Abstract

In this paper we define a new problem, motivated by computational biology, LCSk aiming at finding the maximal number of k length substrings, matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where k = 1. We provide an algorithm, solving the general case in O(n2) time, where n is the length of the input strings, equaling the time required for the special case of k=1. The space requirement of the algorithm is O(kn). %, however, in order to enable %backtracking of the solution, O(n2) space is needed. We also define a complementary EDk distance measure and show that EDk(A,B) can be computed in O(nm) time and O(km) space, where m, n are the lengths of the input sequences A and B respectively.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…