PUL: Pre-load in Software for Caches Wouldn't Always Play Along
Abstract
Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based prefetching exhibits higher efficiency, improving with newer CPU generations. In this paper we investigate software-based, post-Moore systems that offload operations to intelligent memories. We show that software-based prefetching has even higher potential in near-data processing settings by maximizing compute utilization through compute/IO interleaving.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.