Expectation-Maximization as a Spectrally Governed Relaxation Flow
Abstract
The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence is governed by the spectrum of the linearized EM map. How these two levels fit into a single dynamical picture has remained less transparent. We make explicit the latent-variable operator that connects them. Along the EM trajectory, the likelihood increment admits a global energy decomposition in terms of posterior-relative entropy. Linearization at a nondegenerate maximizer θ reveals the local operator \[ Gθ=I-DT(θ), \] which coincides with both the missing-information ratio and the information-geometric Hessian of the observed likelihood. From this operator we derive two acceleration strategies. The G-Accelerator uses the spectral gap to obtain an optimal Nesterov-type momentum β* = (1-λ*)/(1+λ*). The Geo-Adaptive accelerator extends the geometric EM framework of Zhou, Alexander \& Lange by replacing their fixed correction strength γ=8 with the adaptive rule γk = 1/λk, where λk is estimated online from the parameter trajectory. Both methods are parameter-free; Geo-Adaptive achieves dramatic acceleration precisely when the spectral gap is smallest. Numerical experiments on Gaussian mixtures demonstrate that both accelerators consistently outperform standard EM and fixed-γ DCC-EM, with Geo-Adaptive attaining speedups exceeding 8× in the most challenging regimes.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.