Interpretability-by-Design with Accurate Locally Additive Models and Conditional Feature Effects
Abstract
Generalized additive models (GAMs) offer interpretability through independent univariate feature effects but underfit when interactions are present in data. GA2Ms add selected pairwise interactions which improves accuracy, but sacrifices interpretability and limits model auditing. We propose Conditionally Additive Local Models (CALMs), a new model class, that balances the interpretability of GAMs with the accuracy of GA2Ms. CALMs allow multiple univariate shape functions per feature, each active in different regions of the input space. These regions are defined independently for each feature as simple logical conditions (thresholds) on the features it interacts with. As a result, effects remain locally additive while varying across subregions to capture interactions. We further propose a principled distillation-based training pipeline that identifies homogeneous regions with limited interactions and fits interpretable shape functions via region-aware backfitting. Experiments on diverse classification and regression tasks show that CALMs consistently outperform GAMs and achieve accuracy comparable with GA2Ms. Overall, CALMs offer a compelling trade-off between predictive accuracy and interpretability.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.