Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance
Abstract
Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array of length n provides a (1+δ)-multiplicative approximation to the distance to monotonicity (n minus the length of the LIS), and uses only O((2 n)/δ) space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. Our algorithm can be used to estimate the length of the LIS to within an additive δ n for any δ >0 while previous algorithms could only achieve additive error n(1/2-o(1)). Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also give a streaming algorithm for approximating LCS(x,y), the length of the longest common subsequence between strings x and y, each of length n. Our algorithm works in the asymmetric setting (inspired by AKO10), in which we have random access to y and streaming access to x, and runs in small space provided that no single symbol appears very often in y. More precisely, it gives an additive-δ n approximation to LCS(x,y) (and hence also to E(x,y) = n-LCS(x,y), the edit distance between x and y when insertions and deletions, but not substitutions, are allowed), with space complexity O(k(2 n)/δ), where k is the maximum number of times any one symbol appears in y.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.