Sufficient digits and density estimation: A Bayesian nonparametric approach using generalized finite P\'olya trees

Abstract

This paper proposes a novel approach for statistical modelling of a continuous random variable X on [0, 1), based on its digit representation X=.X1X2…. In general, X can be coupled with a latent random variable N so that (X1,…,XN) becomes a sufficient statistics and .XN+1XN+2… is uniformly distributed. In line with this fact, and focusing on binary digits for simplicity, we propose a family of generalized finite P\'olya trees that induces a random density for a sample, which becomes a flexible tool for density estimation. Here, the digit system may be random and learned from the data. We provide a detailed Bayesian analysis, including closed form expression for the posterior distribution. We analyse the frequentist properties as the sample size increases, and provide sufficient conditions for consistency of the posterior distributions of the random density and N. We consider an extension to data spanning multiple orders of magnitude, and propose a prior distribution that encodes the so-called extended Newcomb-Benford law. Such a model shows promising results for density estimation of human-activity data. Our methodology is illustrated on several synthetic and real datasets.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…