Information Geometry and Asymptotic Theory for SMML Estimators

Abstract

Strict minimum message length (SMML) is an information-theoretic coding principle that represents a continuous statistical model by a finite set of assertions and a partition of the sample space. We show that the SMML objective decomposes into assertion entropy and conditional cross-entropy, balancing the cost of identifying an assertion against the cost of encoding data under the assigned model. For any fixed partition, the optimal codepoint for each cell is the model distribution that minimises Kullback-Leibler divergence from the data distribution restricted to that cell. Using the local Fisher-Rao geometry of regular parametric models, we show that, under high-resolution regularity conditions, optimal SMML partitions are asymptotically the pullback, through the maximum likelihood estimator, of weighted Fisher-Rao Voronoi tessellations in parameter space, with assertion probabilities appearing as additive weights. For regular exponential families, SMML codepoints satisfy a moment-matching condition and admit an interpretation as KL/Bregman centroids, while exact SMML cells are pullbacks of convex polyhedra in sufficient-statistic space. Together, these results show that SMML induces a natural information-geometric quantisation linking entropy-based coding, KL projection, and divergence-based Voronoi geometry.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…