A Gap Between Decision Trees and Neural Networks
Abstract
We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation (RTV) seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator 1A has infinite RTV. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite RTV in dimensions d>1, while Gaussian convolution yields finite RTV but with an explicit exponential dependence on d. We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to 1A). For classification, we construct a smooth barrier score SA with finite RTV whose fixed threshold τ=1 exactly recovers the box. Under a mild tube-mass condition near ∂ A, we prove an L1(P) calibration bound that decays polynomially in a sharpness parameter, along with an explicit RTV upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.