Alpha-divergence loss function for neural density ratio estimation
Abstract
Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of f-divergences. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback--Leibler (KL) divergence loss functions. To address these issues, we focus on α-divergence, which provides a suitable variational representation of f-divergence. Subsequently, a novel loss function for DRE, the α-divergence loss function (α-Div), is derived. α-Div is concise but offers stable and effective optimization for DRE. The boundedness of α-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness of α-Div in optimization. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on α-divergence.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.