A Bayesian Semiparametric Mixture Model for Clustering Zero-Inflated Microbiome Data
Abstract
Microbiome research has immense potential for unlocking insights into human health and disease. A common goal in human microbiome research is identifying subgroups of individuals with similar microbial composition that may be linked to specific health states or environmental exposures. However, existing clustering methods are often not equipped to accommodate the complex structure of microbiome data and typically make limiting assumptions regarding the number of clusters in the data which can bias inference. Designed for zero-inflated multivariate compositional count data collected in microbiome research, we propose a novel Bayesian semiparametric mixture modeling framework that simultaneously learns the number of clusters in the data while performing cluster allocation. In simulation, we demonstrate the clustering performance of our method compared to distance- and model-based alternatives and the importance of accommodating zero-inflation when present in the data. We then apply the model to identify clusters in microbiome data collected in a study designed to investigate the relation between gut microbial composition and enteric diarrheal disease.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.