Design Insights into Partition Placement and Routing for DNN Inference in Multi-Hop Edge Networks

Abstract

Partitioned DNN inference is a promising approach for latency-sensitive intelligent services in edge networks, since it allows different parts of a model to be executed across end devices, edge servers, and the cloud. However, in a multi-hop edge network, partition placement and inference traffic routing are inherently coupled: raw inputs, intermediate features, and final outputs may have very different sizes, while candidate nodes also differ in computation capability. In addition, both communication and computation delays can become congestion-dependent under load. In this paper, we study joint partition placement and routing for fixed-partition DNN inference over heterogeneous multi-hop edge networks. We consider a small number of DNN partitions, each placed at exactly one node without replication, and formulate a congestion-aware mixed discrete--continuous optimization problem that captures both routing and execution costs. To solve it, we develop a practical alternating framework that couples partition placement with congestion-aware forwarding updates. Through numerical evaluation on hierarchical, regular, synthetic irregular, and real backbone-inspired topologies, we show that split flexibility is particularly important in IoT--edge--cloud settings, while congestion-aware refinement becomes increasingly beneficial as the offered load grows. We further illustrate how the preferred operating point depends on the communication--computation tradeoff.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…