Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets
Abstract
In federated language modeling, K nodes each hold n samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over V tokens can be estimated when each node may upload at most B bits per query in a public probe set. In federated probe-logit distillation (FPLD), each node transmits a scalar-quantized logit vector on the probe set, and an aggregator distills a global parametric student. Prior work (Dubey and Huo, 2026) establishes a high-probability KL rate O(d/(Kn) + ρV V / m + K-1 · 2-2B/V) plus optimization slack, with the bandwidth term in its trace-sharpened form. Whether this bandwidth-term rate is tight, and how the upper bound generalizes to heterogeneous per-node bandwidths, are left open. We close both gaps. First, the dithered FPLD construction has a matching single-round lower bound Ω(K-1 · 2-2B/V) under non-degeneracy, pinning the bandwidth-axis rate at Θ(K-1 · 2-2B/V). T-round sequential refinement with nested/scaled residual quantizers achieves O(K-1 · 2-2TB/V); vanilla FPLD's T-independent bandwidth term is suboptimal for every T > 1. Second, we establish a heterogeneous-bandwidth upper bound for per-node budgets Bi, paired with a closed-form optimal allocation Bi* = Btot/K + (V/2) 2(wi / wg), a log-tilted water-filling rule that is the per-node analogue of reverse water-filling for distortion-rate optimization. A plug-in adaptive variant estimates the weights from a short warm-up phase and attains 1 + O((K/δ)/(m T0)) relative suboptimality. Synthetic n-gram simulations confirm that empirical KL is bracketed by the upper and lower bounds and that the optimal allocation strictly dominates uniform and inverse-weighted baselines under heterogeneous clipping.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.