What Functions Does XGBoost Learn?

Abstract

This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class Fd, s∞-ST that extends finite ensembles of bounded-depth regression trees, together with a complexity measure Vd, s∞-XGB(·) that generalizes the L1 regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over Fd, s∞-ST with penalty Vd, s∞-XGB(·), providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of Fd, s∞-ST and Vd, s∞-XGB(·) in terms of Hardy--Krause variation. We prove that the least squares estimator over \f ∈ Fd, s∞-ST: Vd, s∞-XGB(f) V\ achieves a nearly minimax-optimal rate of convergence n-2/3 ( n)4((s, d) - 1)/3, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…