Efficiently Computing Edit Distance to Dyck Language
Abstract
Given a string σ over alphabet and a grammar G defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map σ into a valid member of G ? We investigate this basic question in this paper for Dyck(s). Dyck(s) is a fundamental context free grammar representing the language of well-balanced parentheses with s different types of parentheses and has played a pivotal role in the development of theory of context free languages. Computing edit distance to Dyck(s) significantly generalizes string edit distance problem and has numerous applications ranging from repairing semi-structured documents such as XML to memory checking, automated compiler optimization, natural language processing etc. In this paper we give the first near-linear time algorithm for edit distance computation to Dyck(s) that achieves a nontrivial approximation factor of O(1εOPT(n)1ε) in O(n1+εn) time. In fact, given there exists an algorithm for computing string edit distance on input of size n in α(n) time with β(n)-approximation factor, we can devise an algorithm for edit distance problem to Dyck(s) running in O(n1+ε+α(n)) and achieving an approximation factor of O(1εβ(n)OPT). We show that the framework for efficiently approximating edit distance to Dyck(s) can be applied to many other languages. We illustrate this by considering various memory checking languages which comprise of valid transcripts of stacks, queues, priority queues, double-ended queues etc. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.