Learning with Importance Weighted Variational Inference

Abstract

Several variational bounds involving importance weighting ideas generalize the Evidence Lower BOund (ELBO) for marginal likelihood optimization, such as the Importance-weighted Auto-Encoder (IWAE), Variational Rényi (VR) and VR-IWAE bounds. Yet, it remains unclear how the joint choice of bound and gradient estimator impacts the behavior of the resulting variational inference (VI) algorithms. This paper provides a unified theoretical comparison of reparameterized (REP) and doubly-reparameterized (DREP) gradient estimators tied to the IWAE, VR and VR-IWAE bounds. Through asymptotic analyses of the Signal-to-Noise Ratio as the number of Monter Carlo samples N goes to infinity, we identify a bias-variance tradeoff in these gradient estimators and we formally justify the superiority of DREP over REP in importance-weighted VI. An additional asymptotic analysis for challenging regimes, where both N and the Kullback-Leibler divergence between the variational and posterior densities go to infinity, indicates that importance-weighted VI gradient estimators point in a well-founded direction even when the variational approximation deteriorates. Together, these complementary results characterize the optimization trajectory in importance-weighted VI from poor initialization to final convergence. Importantly, our proof techniques establish general theoretical tools for the study of sample means ratios whose scope extend beyond VI and constitute an independent contribution to the field of Monte Carlo methods.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…