Exploiting Network Loss for Distributed Approximate Computing with NetApprox

Abstract

Many data center applications such as machine learning and big data analytics can complete their analysis without processing the complete set of data. While extensive approximate-aware optimizations have been proposed at hardware, programming language, and application levels. However, to date, the approximate computing optimizations have ignored the network layer. We propose NetApprox, which to the best of our knowledge, is the first approximate-aware network layer comprising transport-layer protocol, network resource allocation schemes, and scheduling/priority-assignment policies. Building on the observation that approximate applications can tolerate loss, NetApprox's main insights are to aggressively send approximate traffic (which improves the performance of approximate applications) and to minimize the network resources allocated to approximate traffic (which simultaneously limits the impact of aggressive approximate traffic while freeing up resources that, in turn, improve non-approximate applications' performance). We ported Flink, Kafka, Spark, and PyTorch to NetApprox and evaluated NetApprox with both large-scale simulation and real implementation. Our evaluation results show that NetApprox improves job completion times by up to 80% compared to network-oblivious approximation solutions, and improves the performance of co-running non-approximate workloads by 79%.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…