Improving DNS Exfiltration Detection via Transformer Pretraining
Abstract
We study whether in-domain pretraining of Bidirectional Encoder Representations from Transformer (BERT) model improves subdomain-level detection of exfiltration at low false positive rates. While previous work mostly examines fine-tuned generic Transformers, it does not aim to isolate the effect of pretraining on the downstream task of classification. To address this gap, we develop a controlled pipeline where we freeze operating points on validation and transfer them to the test set, thus enabling clean ablations across different label and pretraining budgets. Our results show significant improvements in the left tail of the Receiver Operating Characteristic (ROC) curve, especially against randomly initialized baseline. Additionally, within pretrained model variants, increasing the number of pretraining steps helps the most when more labeled data are available for fine-tuning.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.