Replica Theory of Spherical Boltzmann Machine Ensembles
Abstract
Training in machine learning generally consists in finding one model, whose parameters minimize a data-dependent loss. Yet, empirical work shows that ensemble learning, an approach in which multiple models are sampled, can improve performance. Here, we provide an analytical framework to understand these observations in the case of Boltzmann machines, exploiting a duality between ensemble learning and large deviations of the free energy in spin-glass models. Replica calculations allow us to fully solve the case of spherical Boltzmann machine ensembles, and clarify when ensemble learning improves over standard loss minimization, in particular for nearly finite-dimensional data. Our framework can also be applied to complex data distributions, in agreement with numerical simulations on deep networks.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.