Between Zeros and Ones: Behavioral Characterization Beyond Binary Labeling Across Public ICS Datasets
Abstract
Intrusion detection in Industrial Control Systems (ICS) is typically evaluated on a small set of public benchmarks using binary ``normal'' versus ``attack'' labels, a practice that can mask the behavioral diversity of cyber-physical attacks. To address this limitation, we propose a behavioral characterization framework that maps raw multivariate process traces into five interpretable physical primitives: drift, spike, oscillation, repetition, and switching. We apply the framework to three widely used ICS benchmarks, namely, SWaT, WADI, and HAI, and show that attack windows exhibit clear behavioral shifts relative to normal operation while the three datasets occupy largely distinct regions of the behavioral space, revealing both cross-dataset bias and intra-dataset diversity. In particular, WADI is dominated by repetition, HAI emphasizes sustained drift and oscillation, and SWaT is characterized by stealthier frozen-telemetry behavior. To examine the evaluation implications, we use an indicative Random Forest baseline and show that aggregate binary metrics can limit visibility into performance across different behavioral proxies. For example, in SWaT, macro F1 drops from 85.44% under binary evaluation to 37.84% under behavior-proxy multiclass prediction, with similar degradations observed on WADI and HAI. Based on these findings, we argue for complementing conventional binary benchmarking with behavior-stratified evaluation to expose blind spots that aggregate scores leave hidden and to better support targeted incident response.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.