Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing

Abstract

LLMs edit text and code by autoregressively regenerating the full output, even when most tokens appear verbatim in the input. We study Copy-as-Decode, a decoding-layer mechanism that recasts edit generation as structured decoding over a two-primitive grammar: <copy lines=&#34;i-j&#34;/> references an input line range, <gen>...</gen> emits new content. A token-level FSM guarantees syntactic validity, and a serving-layer primitive updates the KV cache for each copy span via a single parallel-prefill forward rather than N autoregressive steps -- sharing the parallel-forward kernel of speculative decoding but with input tokens as the draft and program-enforced acceptance replacing probabilistic verification. We report an upper-bound analysis that requires no end-to-end training. (i) Kernel speedup: on Qwen2.5-1.5B, 7B, copying N tokens via parallel prefill is 6.8×--303× faster than autoregressive (N ∈ [8, 512], A100 80GB bf16). (ii) Copy ceiling: on ProbeEdit and HumanEvalPack-Fix (Py/JS), 74--98\% of gold tokens are reachable under the line-level primitive; composed with the empirical kernel over each corpus's span histogram this yields a closed-form wall-clock bound of 29.0× / 3.4× / 4.2× (13.0× pooled). A token-level extension reaches 91--99\% coverage with 4.5×--6.5× floors. (iii) Pipeline losslessness: oracle programs round-trip through the deterministic resolver on all 482 cases, localizing any downstream failure to span selection rather than the mechanism. A perturbation study shows pooled EM drops from 100\% to 15.48\% under off-by-one noise. A fine-tuning pilot on Qwen2.5-Coder-1.5B lifts HEvalFix-Py EM from 0/33 (untrained) to 12--17\%, a learnability signal, not a production selector. Batched-serving integration and multi-file coverage are scoped as follow-up.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…