Fine-Tuning Improves Information Conveyance in Language Models
Abstract
Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an entire generation rollout. To address this, we propose Canopy Entropy (CE), a measure that views language generation from a tree perspective, where ``canopy'' represents the space of all possible rollouts, making CE naturally quantify the effective size of the generation space. CE jointly captures uncertainty in both the output length N and the generated sequence Y1:N -- indeed, we show that it equals to total Shannon entropy H(N, Y1:N X), where X denotes the prompt. This formulation yields interpretable metrics, including a length-entropy correlation term ρ(N, rN), where rN is the entropy rate, quantifying information conveyance efficiency by indicating whether longer outputs are more or less informative per token. Empirically, across tasks and model families, we find that fine-tuned models consistently exhibit stronger positive correlation ρ(N, rN), even when total entropy decreases. Furthermore, after controlling for model family, task, prompt, and output-length effects, we find that fine-tuning nearly triples the correlation strength between entropy rate and semantic diversity, suggesting that aligned models convert token uncertainty into semantic diversity more efficiently. Overall, these results demonstrate that fine-tuning does not simply reduce uncertainty, but fundamentally reorganizes it into more informative and semantically meaningful generations. Our code is available at https://github.com/WeiyiTian/canopy-entropy.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.