Outlier Detection for DNA Fragment Assembly
Abstract
Given n length- strings S =\s1, ..., sn\ over a constant size alphabet together with parameters d and k, the objective in the Consensus String with Outliers problem is to find a subset S* of S of size n-k and a string s such that Σsi ∈ S* d(si, s) ≤ d. Here d(x, y) denotes the Hamming distance between the two strings x and y. We prove 1. a variant of Consensus String with Outliers where the number of outliers k is fixed and the objective is to minimize the total distance Σsi ∈ S* d(si, s) admits a simple PTAS. (ii) Under the natural assumption that the number of outliers k is small, the PTAS for the distance minimization version of Consensus String with Outliers performs well. In particular, as long as k≤ cn for a fixed constant c < 1, the algorithm provides a (1+ε)-approximate solution in time f(1/ε)(n)O(1) and thus, is an EPTAS. 2. In order to improve the PTAS for Consensus String with Outliers to an EPTAS, the assumption that k is small is necessary. Specifically, when k is allowed to be arbitrary the Consensus String with Outliers problem does not admit an EPTAS unless FPT=W[1]. This hardness result holds even for binary alphabets. 3. The decision version of Consensus String with Outliers is fixed parameter tractable when parameterized by dn-k. and thus, also when parameterized by just d. To the best of our knowledge, Consensus String with Outliers is the first problem that admits a PTAS, and is fixed parameter tractable when parameterized by the value of the objective function but does not admit an EPTAS under plausible complexity assumptions.