Gaussian Universality for Diffusion Models

Abstract

We investigate Gaussian Universality for data distributions generated via diffusion models. By Gaussian Universality we mean that the test error of a generalized linear model f(W) trained for a classification task on the diffusion data matches the test error of f(W) trained on the Gaussian Mixture with matching means and covariances per class.In other words, the test error depends only on the first and second order statistics of the diffusion-generated data in the linear setting. As a corollary, the analysis of the test error for linear classifiers can be reduced to Gaussian data from diffusion-generated data. Analysing the performance of models trained on synthetic data is a pertinent problem due to the surge of methods such as sehwag2024stretchingdollardiffusiontraining. Moreover, we show that, for any 1- Lipschitz scalar function φ, φ(x) is close to E φ(x) with high probability for x sampled from the conditional diffusion model corresponding to each class. Finally, we note that current approaches for proving universality do not apply to diffusion-generated data as the covariance matrices of the data tend to have vanishing minimum singular values, contrary to the assumption made in the literature. This leaves extending previous mathematical universality results as an intriguing open question.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…