Characterizing Learning Dynamics under Relative Reparameterization of Singular Models

Abstract

A common way to analyze learning of statistical models is to consider operations in the models parameter space, however this becomes challenging when there is no one-to-one mapping between the parameter space and the underlying statistical model space. Such ``singular models'' occur frequently and exhibit a characteristic decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we consider a relative reparameterization technique of the parameter space, which yields a general method for extracting regular sub-models from singular models. On the example of Gaussian Mixture Models and Neural Networks we theoretically and numerically analyze the convergence rate for Gradient Descent under both parameterizations. Analyzing second-order methods and explicit properties of the Fisher Information Matrix we distinguish between differences in convergence behavior arising from algorithmic and intrinsic information-geometric aspects.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…