Sample-based distance-approximation for subsequence-freeness

Abstract

In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w = w1 … wk, a sequence (text) T = t1 … tn is said to contain w if there exist indices 1 ≤ i1 < … < ik ≤ n such that tij = wj for every 1 ≤ j ≤ k. Otherwise, T is w-free. Ron and Rosin (ACM TOCT 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is (k/ε). Denoting by (T,w,p) the distance of T to w-freeness under a distribution p :[n] [0,1], we are interested in obtaining an estimate , such that | - (T,w,p)| ≤ δ with probability at least 2/3, for a given distance parameter δ. Our main result is an algorithm whose sample complexity is O(k2/δ2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…