CXLMemUring: A Hardware Software Co-design Paradigm for Asynchronous and Flexible Parallel CXL Memory Pool Access

Abstract

CXL-attached memory lets servers add more memory while keeping the standard load/store programming model. The main drawback is latency. CXL memory accesses are too slow for normal CPU mechanisms to hide reliably, especially when each access depends on the result of a previous one. At the same time, they are too fast for traditional software techniques, such as context switches or interrupt-based asynchrony, to manage one load at a time. On a real Granite Rapids CXL platform, we find that placing GAPBS graph workloads in CXL memory slows execution by 2.44x on average compared with local DRAM. A state-of-the-art software prefetcher still leaves a 2.21x slowdown. This paper presents our system, a hardware/software co-designed approach for hiding CXL latency using larger units of asynchronous work. The system creates regions: parts of the original program that include one or more CXL-resident memory operations together with the nearby address-generation and memory-orchestration logic needed to run them. The host CPU launches these regions asynchronously on a near-memory accelerator built on a commodity CXL Type 2 FPGA, then continues useful work while the device runs ahead and prepares future memory accesses. A compiler identifies candidate regions from unmodified source programs, and an online JIT refines region boundaries and execution parameters based on workload behavior. We implement the system as a prototype compiler, runtime, and Vortex-based CXL-side accelerator. Across GAPBS, MCF, Spatter, and NAS Parallel Benchmark workloads, it improves end-to-end performance by 1.45x to 1.75x, with a 1.59x geometric mean speedup, compared with native CXL-memory execution.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…