Protocol-Aware Tokenization and Architecture Co-Design for Wireless Packet Foundation Models
Abstract
What matters more for building foundation models for wireless packet traces: the tokenizer or the architecture or both? To answer this question, we build on PLUME Anonymous [2026], which introduced protocol-aware tokenization for 802.11 traces; we scale model depth and transfer the same tokenizer to a fundamentally different architecture family. A deeper GPT (PLUME-DEEP, 24 layers) reaches 98.2% top-1 accuracy, gaining 32 points over the original 12-layer design, while a Mamba-2 state-space variant (PLUME-MAMBA) achieves 96.1% with 1.7x higher throughput and 2x longer context. The key insight emerges from a controlled 2x2 comparison across tokenizers and architectures: changing the tokenizer swings accuracy by 32 points; changing the architecture moves it by only 2. Protocol-aware tokenization is the primary performance lever, and the backbone becomes a deployment knob trading accuracy for speed.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.