Performance Analysis of Digital Processing-in-Memory through a Case Study on Convolutional-Neural-Network Acceleration

Abstract

Processing-in-Memory (PIM) architectures are evolving to minimize data movement by leveraging the same physical devices for both memory and logic functionalities. While analog PIM harnesses crossbar arrays for efficient approximate matrix-vector multiplication, digital PIM architectures facilitate massively-parallel bitwise operations for more general workloads. Recent works have extended digital PIM towards the full-precision acceleration of convolutional neural networks (CNNs), yet a comprehensive comparison with GPUs remains a gap in the literature that may illuminate the limitations of digital PIM. This paper aims to fill this void by conducting a thorough examination of CNN acceleration through an updated quantitative comparison with GPUs. Our approach begins with a theoretical investigation into various PIM architectures, shedding light on their performance characteristics and constraints. Subsequently, through a series of benchmarks spanning memory-bound vectored arithmetic to CNN acceleration, we provide insights into digital PIM performance that may guide the acceleration of applications in the future.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…