Syntax- and Compilation-Preserving Evasion of LLM Vulnerability Detectors
Abstract
LLM-based vulnerability detectors are increasingly deployed in CI/CD security gating, yet their resilience to evasion under syntax- and compilation-preserving edits remains poorly understood. We evaluate five attack variants spanning four carrier families of behavior-preserving code transformations on a unified C/C++ benchmark (N=5000) and introduce Complete Resistance (CR), measuring the fraction of correctly detected vulnerabilities that withstand all attack variants. Our findings reveal a significant robustness gap: models achieving 70\%+ clean recall exhibit CR as low as 0.12\%, meaning over 87\% of detected vulnerabilities can be evaded by at least one syntax-preserving edit. Universal adversarial strings optimized on a 14B surrogate transfer effectively to black-box APIs including GPT-4o, while on-target optimization further amplifies evasion (up to 92.5\% ASR). These results indicate that clean benchmark accuracy alone is insufficient as a security guarantee for deployed vulnerability detectors.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.