Optimal Convergence Rates of Deep Neural Network Classifiers

Abstract

In this paper, we study the binary classification problem on [0,1]d under the Tsybakov noise condition (with exponent s ∈ [0,∞]) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of q+1 vector-valued multivariate functions, where each component function is either a maximum value function or a H\"older-β smooth function that depends only on d* of its input variables. Notably, d* can be significantly smaller than the input dimension d. We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is ( 1n )β·(1β)qd*s+1+(1+1s+1)·β·(1β)q, which is independent of the input dimension d. Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The generalized approach is of independent interest.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…