NetForge: A Programmable Substrate for Bottleneck-Centric Network Data Generation
Abstract
The behavior of Internet applications is shaped by congestion dynamics at bottleneck links, yet data capturing application behavior across diverse bottleneck regimes remains scarce. Bridging this gap requires a data-generation substrate that simultaneously provides controllability, composability, fidelity, and replicability--capabilities existing approaches struggle to achieve simultaneously. This paper introduces NetForge, a programmable substrate for bottleneck-centric data generation guided by progressive disaggregation: NetForge (i) decouples bottleneck intent from execution, (ii) separates static bottleneck attributes from dynamic congestion pressure, and (iii) disaggregates observed demand dynamics from their original trace context via Cross-Traffic Profiles (CTPs). CTPs transform passive packet traces into reusable, composable pressure signals that can be selected and transformed to specify dynamic bottleneck behavior. Our evaluation shows that NetForge satisfies the four requirements and, in an ABR case study, generates data that remains realistic, expands coverage into underrepresented regimes, and, in turn, improves model performance by up to 47% by reducing transmission-time prediction error of the Fugu model. Together, these results establish NetForge as a practical substrate for studying Internet application behavior across diverse bottleneck regimes.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.