OJAL\'A: Optimizing J-PAS Astronomy for Large-scale Analysis. A foundation model for the SED of galaxies, QSOs and stars
Abstract
The advent of large-scale surveys requires efficient ML techniques to exploit the information of massive datasets. We present OJALA, a transformer-based autoregressive foundation model designed to simultaneously classify astronomical objects and infer their physical parameters using 54 narrow bands from J-PAS, combined with broad bands from the DESI Legacy Imaging Surveys and WISE. The model is trained on 20 million synthetic SEDs generated from DESI DR1 spectra. We validate OJALA using a cross-matched sample of 121,000 objects between J-PAS and DESI. The model achieves a weighted F1-score of approximately 0.9 for spectral classification (stars, galaxies, and QSOs) at i < 21. For galaxies, we recover photo-z with a precision of σ NMAD < 0.01, while for QSOs, the precision improves significantly at z > 1.5, reaching σ NMAD ≈ 0.006 at z ≈ 3.5. We demonstrate robust estimation of physical properties for galaxies, recovering stellar masses and SFR with a scatter of approximately 0.11 dex and 0.22 dex, respectively. Furthermore, the model accurately predicts EWs for major optical emission lines, allowing for the derivation of extinction-corrected Hα luminosities with a scatter of 0.29 dex. OJALA successfully reproduces the BPT and WHAN diagnostic diagrams, classifying SF, AGN, and passive galaxies with F1-scores typically ranging from 70% to 90% depending on the diagnostic class. For stars, the model reliably infers effective temperature and metallicity, though surface gravity remains challenging. Finally, we show the modularity of the architecture by fine-tuning the pre-trained embeddings to predict BH masses, a property not included in the primary training, recovering spectroscopic virial estimates with a precision of approximately 0.5 dex. We release the code, model weights, and a comprehensive VAC for the J-PAS EDR.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.