TransientViT: A novel CNN - Vision Transformer hybrid real/bogus transient classifier for the Kilodegree Automatic Transient Survey
Abstract
The detection and analysis of transient astronomical sources is of great importance to understand their time evolution. Traditional pipelines identify transient sources from difference (D) images derived by subtracting prior-observed reference images (R) from new science images (N), a process that involves extensive manual inspection. In this study, we present TransientViT, a hybrid convolutional neural network (CNN) - vision transformer (ViT) model to differentiate between transients and image artifacts for the Kilodegree Automatic Transient Survey (KATS). TransientViT utilizes CNNs to reduce the image resolution and a hierarchical attention mechanism to model features globally. We propose a novel KATS-T 200K dataset that combines the difference images with both long- and short-term images, providing a temporally continuous, multidimensional dataset. Using this dataset as the input, TransientViT achieved a superior performance in comparison to other transformer- and CNN-based models, with an overall area under the curve (AUC) of 0.97 and an accuracy of 99.44%. Ablation studies demonstrated the impact of different input channels, multi-input fusion methods, and cross-inference strategies on the model performance. As a final step, a voting-based ensemble to combine the inference results of three NRD images further improved the model's prediction reliability and robustness. This hybrid model will act as a crucial reference for future studies on real/bogus transient classification.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.