Advances in Bayesian random partition models: A comprehensive review
Abstract
Clustering is a crucial task in various domains of knowledge, including medicine, epidemiology, genomics, environmental science, economics, and visual sciences, among others. Methodologies for inferring the number of clusters have often been shown to be inconsistent, and incorporating a dependence structure among clusters introduces additional challenges in the estimation process. In a Bayesian framework, clustering is performed by treating the unknown partition as a random object and defining a prior distribution for it. This prior distribution can be induced by models assumed for the observations or directly defined on the partition itself. However, recent findings have revealed difficulties in consistently estimating the number of clusters and, consequently, the partition. Furthermore, summarizing the posterior distribution of the partition remains an open problem due to the high dimensionality of the partition space. This study aims to review Bayesian approaches for random partition models, highlighting the advantages and disadvantages of each method, and suggesting potential avenues for future research.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.