Three Costs of Amortizing Gaussian Process Inference with Neural Processes
Abstract
Neural processes amortize Gaussian process inference, replacing the exact O(n3) posterior with a learned O(n) map from context sets to predictive distributions. For a class of latent neural processes, we bound the Kullback--Leibler (KL) divergence between the GP and LNP predictives, decomposing it into three interpretable sources, namely label contamination as the neural process uses label values to estimate a quantity that is label-independent in the exact GP, an information bottleneck because the finite-dimensional representation cannot resolve the full context geometry, and amortization error from a single encoder network shared across all contexts. The bottleneck truncation term decays in the representation dimension d as O(e-cd2/dx) for squared-exponential kernels on Rdx where c > 0 is a kernel-dependent constant and as O(d-2ν/dx) for Matérn-ν kernels, directly linking architecture sizing to kernel smoothness and input dimension. The label contamination term is O(1) in general, with only the observation-noise component decaying as O(1/n), identifying a persistent cost of routing uncertainty estimation through a label-dependent representation. These results characterize the costs of amortization within the analyzed class and yield architectural recommendations to predict variance from context locations alone in the GP-amortization regime, and replace mean aggregation with second-order pooling to close the dominant amortization gap.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.