Human vs. machine -- 1:3. Joint analysis of classical and ML-based summary statistics of the Lyman-α forest
Abstract
In order to compress and more easily interpret Lyman-α forest (LyαF) datasets, summary statistics, e.g. the power spectrum, are commonly used. However, such summaries unavoidably lose some information, weakening the constraining power on parameters of interest. Recently, machine learning (ML)-based summary approaches have been proposed as an alternative to human-defined statistical measures. This raises a question: can ML-based summaries contain the full information captured by traditional statistics, and vice versa? In this study, we apply three human-defined techniques and one ML-based approach to summarize mock LyαF data from hydrodynamical simulations and infer two thermal parameters of the intergalactic medium, assuming a power-law temperature-density relation. We introduce a metric for measuring the improvement in the figure of merit when combining two summaries. Consequently, we demonstrate that the ML-based summary approach not only contains almost all of the information from the human-defined statistics, but also that it provides significantly stronger constraints by a ratio of better than 1:3 in terms of the posterior volume on the temperature-density relation parameters.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.