Space-Efficient Language Generation in the Limit
Abstract
We initiate a resource-aware theory of language generation in the limit under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language K and must eventually output a hallucination-free hypothesis language L ⊂eq K while omitting at most Δ strings of K. We focus on Cs,k, the collection of languages recognized by DFAs with at most s states over an alphabet of size k, as the natural hypothesis class for memory-bounded learners. In the exponential-space regime, we prove that a learner can exactly identify the target K. Under a stricter memory budget, we characterize the strongest possible generation guarantees. In particular, we present a streaming algorithm using poly(s,k) space that converges to a hypothesis with generation gap Δ= O(k2s-2). Moreover, the learned hypothesis captures every string in K of length at least 2s-1. We complement this result with a near-matching lower bound through a reduction from a standard communication complexity problem. Specifically, achieving generation gap Δ k(1-)s requires kΩ( s) memory. Together, these results reveal a sharp transition between polynomial-space generation and exponential-space exact identification.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.