Epistemic Regret Minimization: Label-Free Causal Critique Beyond Outcome Reward

Abstract

Large language models can answer causal questions correctly for the wrong reasons. Current RL methods reward what a model concludes but ignore why, reinforcing correlational shortcuts -- a failure we call Reward Entrenchment. We introduce Epistemic Regret Minimization (), a framework that critiques the causal structure of a model's reasoning trace rather than its answer. Applying established causal principles, flags unexamined confounders, correlation--intervention conflation, and unchecked back-door paths from exposed reasoning traces. The framework admits label-free operation -- without the true causal graph or correct answer -- and we separately distinguish favorable benchmark-derived critique, error-direction cues, and fully label-free judge-generated critique in the experiments. Within a single episode, detects and repairs causal reasoning errors; across episodes, it accumulates interventional evidence into a reward signal applicable where no answer key exists. Experiments on 1,360 scenarios across six frontier LLMs show that reasoning-heavy models (GPT-4 Turbo, GPT-5.2) resist outcome-only correction (25--31\% recovery) yet respond to causal critique (78--91\%), gaining +53--59 pp. Standard test-time methods (self-consistency, Best-of-N, Self-Refine) underperform outcome-only reprompting on causal tasks, while ERM reduces residual Rung Collapse from 55--70\% to 4\%. A separation theorem proves outcome-only reward cannot close this gap; a controlled simulation confirms epistemic feedback does, outperforming outcome-only baselines 38-fold.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…