Improved Predictive Performance and Interpretability for Mesomorphic Neural Networks Using Local Fidelity Regularization
Abstract
Interpretable Mesomorphic Neural Networks (IMNs) offer a promising framework that combines the predictive power of deep neural networks with the interpretability of linear models. However, the original formulation lacks safeguards to ensure that the learned interpretations are in fact reliable. In particular, the network is free to concentrate all explanatory variance into a single weight of the linear output layer, achieving strong predictive performance while producing interpretations that are largely meaningless. Paradoxically, the L1 penalty proposed to encourage sparse solutions exacerbates this problem by further incentivizing such degenerate configurations. To address this vulnerability, we introduce Local Fidelity Regularization (LFR), a novel penalty term that prevents degenerate weight collapse by aligning the linear output weights with local data variations. This structural constraint guarantees faithful explanations and substantially improves the reliability of model interpretations. Furthermore, empirical evaluations across the OpenML benchmark suite demonstrate that LFR does not compromise accuracy for explainability; rather, it achieved improved AUROC over the unregularized IMN. By yielding results highly competitive with state-of-the-art black-box models, LFR provides the dual benefit of reliable interpretability and superior predictive performance. Source code and usage instructions are available at https://github.com/hugohammer/LFR-IMN.git.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.