Autolearn: Learn by Surprise, Commit by Proof
Abstract
We propose Autolearn, a framework that enables language models to learn from documents they read, with no external supervision. Passages that produce anomalously high per-token loss are flagged, verified through a self-generated Q&A chain, and trained on with conviction-proportional β2 adjustment. We introduce the perturbation gap (paraphrase-to-original perplexity ratio) as a metric that distinguishes memorization from understanding. The key mechanism is the training data format: Q&A-format training drives the perturbation gap below the pre-trained baseline (2.098 vs. 2.204, = -0.106, > 10σ), suppressing token-sequence memorization, while standard fine-tuning's best attempt remains within noise ( = -0.010, < 1σ). Across four models spanning Qwen3 and Phi-4 families, Autolearn is the only method that enters this regime. Stochastic evaluation reveals passage-specific knowledge acquisition: the probability of generating a correct novel fact rises from 6% to 54% after training (p < 10-4), and Q&A format outperforms standard fine-tuning on genuinely novel facts. The system is self-extinguishing: learned content reduces surprisal below threshold and is skipped on re-encounter.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.