SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data
Abstract
We present SURGE, a streaming GPU encoding system deployed in production to generate embeddings for over 800 million texts across 40,000 logical partitions. Production embedding pipelines face a tension between logical data partitioning and efficient GPU utilization: processing each partition independently incurs P inter-process communication (IPC) calls whose overhead limits throughput for compute-light models. Our contributions are analytical: (i) a cost model (Theorem 1) predicting throughput within 2% across three encoders spanning a 15× parameter range; (ii) a memory-safety bound (Lemma 3) enabling a streaming two-threshold policy with peak memory O(B + n) rather than O(N); and (iii) a φ/CV decision framework characterizing when the pattern applies beyond our workload. The naive fix of batching at fixed size requires O(N) peak memory (32.7 GB at 10M texts; infeasible beyond ~60M on 192 GB nodes), produces no output until all encoding completes, and offers no fault tolerance. SURGE achieves the same throughput with O(B + n) bounded memory (2.6 GB), 68× faster time-to-first-output, and crash recovery at SuperBatch granularity. On 10M texts with 4 NVIDIA L4 GPUs, SURGE delivers 26,413 texts/s -- matching fixed-batch throughput while using 12.6× less memory. We validate on bge-base (109M, d=768, error 1.3%) and across log-normal σ in 1.0, 1.72, 2.5 (speedup invariant within 3%), and compare against a partition-batched baseline (PB-PBP-LB), against which SURGE retains a 7% throughput edge and 2.5× faster TTFO. Complementary engineering -- zero-copy Arrow serialization (22-25× speedup) and async I/O pipelining (up to 93% benefit) -- realizes the design but is not the contribution.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.