Generalized massive optimal data compression

Benjamin Wandelt

doi:10.1093/mnrasl/sly029

Generalized massive optimal data compression

Abstract

Data compression has become one of the cornerstones of modern astronomical data analysis, with the vast majority of analyses compressing large raw datasets down to a manageable number of informative summaries. In this paper we provide a general procedure for optimally compressing N data down to n summary statistics, where n is equal to the number of parameters of interest. We show that compression to the score function -- the gradient of the log-likelihood with respect to the parameters -- yields n compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Lo\'eve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general non-Gaussian case as long as mild regularity conditions are satisfied, producing optimal non-linear summary statistics when appropriate. As a worked example, we derive explicitly the n optimal compressed statistics for Gaussian data in the general case where both the mean and covariance depend on the parameters.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…