Variable Importance in Generalized Linear Models -- A Unifying View Using Shapley Values
Abstract
Variable importance in regression analyses is of considerable interest in a variety of fields. There is no unique method for assessing variable importance. However, a substantial share of the available literature employs Shapley values, either explicitly or implicitly, to decompose a suitable goodness-of-fit measure, in the linear regression model typically the classical R2. Beyond linear regression, there is no generally accepted goodness-of-fit measure, only a variety of pseudo-R2s. We formulate and discuss the desirable properties of goodness-of-fit measures that enable Shapley values to be interpreted in terms of relative, and even absolute, importance. We suggest to use a pseudo-R2 based on the Kullback-Leibler divergence, the Kullback-Leibler R2, which has a convenient form for generalized linear models and permits to unify and extend previous work on variable importance for linear and nonlinear models. Several examples are presented, using data from public health and insurance.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.