Comparison of symbolic regression algorithms in Star/galaxy/quasar separation

Abstract

This work investigates symbolic regression (SR) as an interpretable alternative to black-box machine learning for the classification of stars, galaxies, and quasars in the Sloan Digital Sky Survey Data Release 17 (SDSS DR17). We conduct a systematic comparative study of four state-of-the-art SR frameworks: PySR, Exhaustive Symbolic Regression ( ESR) with MDL-based selection, Physical Symbolic Optimization ( PhySO) using deep reinforcement learning, and Multi-View Symbolic Regression ( MvSR). By deriving compact analytic functions (complexity ≤ 10) on a representative training subset and subsequently evaluating them via an 80,000-sample 5-fold cross-validation threshold optimization phase and a subsequent 10,000-sample unseen hold-out test set, we map spectroscopic redshift (z) to continuous classification scores. Our results demonstrate that these low-complexity expressions achieve high predictive reliability, with MvSR reaching a cross-validation Cohen's Kappa of 0.8956 (0.8876 on the hold-out set) and PhySO achieving exceptional parametric stability (σ< 0.002). We note however that the resulting equations returned by Symbolic regression are purely empirical and no physical significance should be ascribed to these equations.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…