ENFOR-SA: End-to-end Cross-layer Transient Fault Injector for Efficient and Accurate DNN Reliability Assessment on Systolic Arrays
Abstract
Recent advances in deep learning have produced highly accurate but increasingly large and complex DNNs, making traditional fault-injection techniques impractical. Accurate fault analysis requires RTL-accurate hardware models. However, this significantly slows evaluation compared with software-only approaches, particularly when combined with expensive HDL instrumentation. In this work, we show that such high-overhead methods are unnecessary for systolic array (SA) architectures and propose ENFOR-SA, an end-to-end framework for DNN transient fault analysis on SAs. Our two-step approach employs cross-layer simulation and uses RTL SA components only during fault injection, with the rest executed at the software level. Experiments on CNNs and Vision Transformers demonstrate that ENFOR-SA achieves RTL-accurate fault injection with only 6% average slowdown compared to software-based injection, while delivering at least two orders of magnitude speedup (average 569×) over full-SoC RTL simulation and a 2.03× improvement over a state-of-the-art cross-layer RTL injection tool. ENFOR-SA code is publicly available at https://github.com/rafaabt/ENFOR-SA.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.