From Letters to Words and Back: Invertible Coding of Stationary Measures
Abstract
Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measures, called the normalized transport, that preserves both stationarity and ergodicity. The normalized transport applies so called self-avoiding codes that generalize comma-separated codes and specialize bijective stationary codes. The normalized transport is also connected to the usual measure transport via underlying asymptotically mean stationary measures. It preserves the ergodic decomposition. The normalized transport and self-avoiding codes arise in the problem of successive recurrence times. We show that successive recurrence times are ergodic for an ergodic measure, which strengthens a result by Chen Moy from 1959. We also relate the entropy rates of processes linked by the normalized transport.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.