Flow Reasoning Models: Scaling Reasoning Through Iterative Self-Refinement

Abstract

Discrete flow models have recently shown promising performance on few-step text generation; however, when naively applied to structured reasoning tasks such as Sudoku and Zebra puzzles, they converge confidently to incorrect answers (solving only 36% of Sudoku puzzles). We introduce Flow Reasoning Models (FRMs), a training and test-time-scaling framework for structured reasoning with flow models. We make the observation that, despite their poor solve rate, flow models can act as their own verifiers. A correct answer is a stable fixed point of the denoising dynamics, returning to itself when re-noised and re-solved. This enables a test-time-scaling paradigm: propose many candidate solutions and keep those that are dynamically stable, which alone reaches high solve rates on Sudoku-Shah (~100\%) and Zebra (95.9\%). This even generalizes to harder out-of-distribution puzzles like Sudoku-Extreme (96.1\%), without ever training on that distribution. This pure search, however, wastes a great deal of computation generating incorrect candidate solutions. We therefore design a training recipe to improve the base model's efficiency. First, we train flow models with a self-conditioning channel and close it at inference, letting them refine their own past predictions. Second, we train models to avoid their own failed generations using direct preference optimization. These changes substantially improve the base model's efficiency, letting it reach 99.2\% on Sudoku in just 7 forward passes, over 8× fewer than the strongest matched masked-diffusion baseline we compare needs for the same accuracy. When combined with test-time scaling, this lets flow models solve hard out-of-distribution puzzles (e.g. Sudoku-Extreme) far more efficiently.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…