The Recurrent Processing Unit: Hardware for High Speed Machine Learning

Abstract

Machine learning applications are computationally demanding and power intensive. Hardware acceleration of these software tools is a natural step being explored using various technologies. A recurrent processing unit (RPU) is fast and power-efficient hardware for machine learning under development at the University of Maryland. It is comprised of a recurrent neural network and a trainable output vector as a hardware implementation of a reservoir computer. The reservoir is currently realized on both Xilinx 7-series and Ultrascale+ ZYNQ SoCs using an autonomous Boolean network for processing and a Python-based software API. The RPU is capable of classifying up to 40M MNIST images per second with the reservoir consuming under 261mW of power. Using an array of 2048 unclocked gates with roughly 100pS transition times, we achieve about 20 TOPS and 75 TOPS/W.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…