Approximating photo-z PDFs for large surveys
Abstract
Modern galaxy surveys produce redshift probability density functions (PDFs) in addition to traditional photometric redshift (photo-z) point estimates. However, the storage of photo-z PDFs may present a challenge with increasingly large catalogs, as we face a trade-off between the accuracy of subsequent science measurements and the limitation of finite storage resources. This paper presents qp, a Python package for manipulating parametrizations of 1-dimensional PDFs, as suitable for photo-z PDF compression. We use qp to investigate the performance of three simple PDF storage formats (quantiles, samples, and step functions) as a function of the number of stored parameters on two realistic mock datasets, representative of upcoming surveys with different data qualities. We propose some best practices for choosing a photo-z PDF approximation scheme and demonstrate the approach on a science case using performance metrics on both ensembles of individual photo-z PDFs and an estimator of the overall redshift distribution function. We show that both the properties of the set of PDFs we wish to approximate and the chosen fidelity metric(s) affect the optimal parametrization. Additionally, we find that quantiles and samples outperform step functions, and we encourage further consideration of these formats for PDF approximation.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.