Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

Abstract

Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix monolithically -- through proxies such as concrete scores (SEDD) or clean-data predictions (MDLM, GIDD) -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. We propose Neural CTMC, which exploits the underlying Poisson structure of CTMC dynamics by separately parameterizing the reverse process through an exit rate (when to jump) and a jump distribution (where to jump) via two dedicated network heads. We show that the evidence lower bound (ELBO) reduces to a path-space KL divergence between the true and learned reverse processes that factorizes into a Poisson KL for timing and a categorical KL for direction, and admits a tractable, gradient-equivalent and consistent loss. Experimentally, scored by Gemma2-9B, our pure-uniform Neural CTMC achieves 16.36 generative perplexity on TinyStories (vs.\ GIDD 37.60 and MDLM 42.66). On OpenWebText, it attains the best perplexity at the same training-token budget across 16--128 sampling steps among the methods we compare (e.g., at 128 steps: Neural CTMC 183.6 vs.\ MDLM 210.5 and GIDD 249.8). To facilitate reproducibility, we release our pretrained weights at https://huggingface.co/Jiangxy1117/Neural-CTMC.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…