TinyTorch: Building Machine Learning Systems from First Principles
Abstract
Machine learning education faces a fundamental gap: students learn algorithms without understanding the systems that execute them. They study gradient descent without measuring memory, attention mechanisms without analyzing O(N2) scaling, optimizer theory without knowing why Adam requires 3x the memory of SGD. This "algorithm-systems divide" produces practitioners who can train models but cannot debug memory failures, optimize inference latency, or reason about deployment trade-offs--the very skills industry demands as "ML systems engineering." We present TinyTorch, a 20-module curriculum that closes this gap through "implementation-based systems pedagogy": students construct PyTorch's core components (tensors, autograd, optimizers, CNNs, transformers) in pure Python, building a complete framework where every operation they invoke is code they wrote. The design employs three patterns: "progressive disclosure" of complexity, "systems-first integration" of profiling from the first module, and "build-to-validate milestones" recreating 67 years of ML breakthroughs--from Perceptron (1958) through Transformers (2017) to MLPerf-style benchmarking. Requiring only 4GB RAM and no GPU, TinyTorch demonstrates that deep ML systems understanding is achievable without specialized hardware. The curriculum is available open-source at mlsysbook.ai/tinytorch.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.