COSMOS2025: Machine Learning Classification of Early- and Late-type Galaxies at 0 < z < 3

Abstract

We present a fast, interpretable machine learning framework to classify early- and late-type galaxies in the COSMOS2025 catalog at 0 < z < 3, without relying on image-based training labels or computationally expensive structural fitting. Using the Santa Cruz Semi-Analytic Model, we generate a training set with secure morphological labels defined by bulge-to-total mass ratio and specific star formation rate. We bridge the simulation-to-observation domain gap by injecting realistic photometric noise derived from COSMOS2025. A CatBoostClassifier trained on 66 broadband colors achieves excellent performance in the simulated domain, recovering late-types with 98\% precision/recall and early-types with 91\% precision and 88\% recall. Applied to 44,132 COSMOS2025 galaxies, the model reveals a striking bimodality: only about 6\% of galaxies receive intermediate probabilities (0.3 < P(Early type) < 0.7) -- nearly identical to the fraction observed in the simulation. This demonstrates that broadband colors are a decisive morphological discriminant, with the remaining 94\% classified at high confidence. Validation against independent bulge+disk decompositions yields 70\% overall accuracy, with late-types identified at 78\% purity and 74\% completeness. The most important color feature, F277W-F444W, reflects the expected optical/NIR contrast between old and young stellar populations. The full pipeline completes in under 30 minutes on standard hardware, demonstrating that simulation-trained color-based classifiers offer a scalable, physically interpretable route to approximate morphology for large next-generation surveys.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…