Understanding and Accelerating EM Algorithm's Convergence by Fair Competition Principle and Rate-Verisimilitude Function

Abstract

Why can the Expectation-Maximization (EM) algorithm for mixture models converge? Why can different initial parameters cause various convergence difficulties? The Q-L synchronization theory explains that the observed data log-likelihood L and the complete data log-likelihood Q are positively correlated; we can achieve maximum L by maximizing Q. According to this theory, the Deterministic Annealing EM (DAEM) algorithm's authors make great efforts to eliminate locally maximal Q for avoiding L's local convergence. However, this paper proves that in some cases, Q may and should decrease for L to increase; slow or local convergence exists only because of small samples and unfair competition. This paper uses marriage competition to explain different convergence difficulties and proposes the Fair Competition Principle (FCP) with an initialization map for improving initializations. It uses the rate-verisimilitude function, extended from the rate-distortion function, to explain the convergence of the EM and improved EM algorithms. This convergence proof adopts variational and iterative methods that Shannon et al. used for analyzing rate-distortion functions. The initialization map can vastly save both algorithms' running times for binary Gaussian mixtures. The FCP and the initialization map are useful for complicated mixtures but not sufficient; we need further studies for specific methods.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…