Statistical Model Criticism of Variational Auto-Encoders

Abstract

We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…