Extended feature allocation models

Abstract

Feature allocation models are Bayesian nonparametric tools tailored to data in which each observation can simultaneously exhibit multiple characteristics, or features. A fundamental limitation of standard formulations is that feature labels are assumed to be independent and identically distributed, and therefore play no role in posterior inference. The present paper introduces a unified Bayesian framework for extended feature allocation models, in which feature labels and proportions are modeled jointly, thereby enabling the simultaneous discovery of features and learning of dependencies among their labels. Building on point process theory, we develop a full Bayesian analysis of these models. Within this general setting, we also characterize previously proposed priors as those leading to poor predictive distributions, which cannot capture label dependencies and are insensitive to the observed frequency spectrum. Our methodology is designed to move beyond such standard formulations by leveraging the information carried by feature labels. We demonstrate the usefulness of our approach by introducing: (i) a Cox process prior that clusters genomic variant embeddings while predicting new variants and new variant clusters; (ii) a determinantal point process prior for repeated forest surveys, where prediction concerns both the number and the locations of unobserved trees.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…