Nearest Neighbor distributions: new statistical measures for cosmological clustering
Abstract
The use of summary statistics beyond the two-point correlation function to analyze the non-Gaussian clustering on small scales is an active field of research in cosmology. In this paper, we explore a set of new summary statistics -- the k-Nearest Neighbor Cumulative Distribution Functions (k NN- CDF). This is the empirical cumulative distribution function of distances from a set of volume-filling, Poisson distributed random points to the k-nearest data points, and is sensitive to all connected N-point correlations in the data. The k NN- CDF can be used to measure counts in cell, void probability distributions and higher N-point correlation functions, all using the same formalism exploiting fast searches with spatial tree data structures. We demonstrate how it can be computed efficiently from various data sets - both discrete points, and the generalization for continuous fields. We use data from a large suite of N-body simulations to explore the sensitivity of this new statistic to various cosmological parameters, compared to the two-point correlation function, while using the same range of scales. We demonstrate that the use of k NN- CDF improves the constraints on the cosmological parameters by more than a factor of 2 when applied to the clustering of dark matter in the range of scales between 10h-1 Mpc and 40h-1 Mpc. We also show that relative improvement is even greater when applied on the same scales to the clustering of halos in the simulations at a fixed number density, both in real space, as well as in redshift space. Since the k NN- CDF are sensitive to all higher order connected correlation functions in the data, the gains over traditional two-point analyses are expected to grow as progressively smaller scales are included in the analysis of cosmological data.