Understanding GPU Resource Interference One Level Deeper

Abstract

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating applications is known to improve GPU utilization, but is not common practice as it becomes difficult to provide predictable performance due to workload interference. Providing predictable performance guarantees requires a deep understanding of how applications contend for shared GPU resources such as block schedulers, compute units, L1/L2 caches, and memory bandwidth. We study the key types of GPU resource interference and develop a methodology to quantify the sensitivity of a workload to each type. We discuss how this methodology can serve as the foundation for GPU schedulers that enforce strict performance guarantees and how application developers can design GPU kernels with colocation in mind to improve efficiency.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…