A Global Characterization of f-Divergences Yielding PSD Mutual-Information Matrices

Abstract

Given n random variables, when does the matrix of pairwise f-mutual informations define a PSD kernel over variables? For convex finite generators f:(0,∞) with f(1)=0 and finite boundary value f(0), we give a closed characterization up to linear transformation f f+c(t-1), which leaves every f-divergence and every f-mutual-information matrix unchanged. The matrix M(f)ij:=If(Xi;Xj) is PSD for every finite-alphabet family if and only if the normalized representative has a globally convergent expansion f(t)=Σm2am(t-1)m, with am0, on all of (0,∞). Sufficiency follows from a replica embedding for monomial generators plus closure under nonnegative mixtures. Necessity first extracts the local Taylor cone at 1 using biased three-point kernels Ha, the Belton--Guillot--Khare--Putinar (BGKP) low-rank Hankel positivity-preserver theorem, and then bootstraps analyticity to the divergence. This is a kernel characterization problem, not a metric one: PSD of the variable-indexed matrix is distinct from Hilbertian properties of divergences between distributions. The result explains why Shannon MI and Jensen--Shannon fail, why χ2 succeeds, and why non-analytic divergences such as total variation and ReLU are excluded.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…