Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Abstract

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural \& evolutionary signals are encoded in dense latent spaces. We propose a plug-\&-play framework that projects ESM-2 representations onto protein contact graphs \& applies SoftBlobGIN, a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing \& learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8\% accuracy \& 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues, spatially localized functional clusters, \& catalytic contact patterns. On binding-site detection, SoftBlobGIN improves residue AUROC from 0.885 using an ESM-2 linear probe to 0.983, indicating that these structural explanations are not recoverable from language-model features alone. Learned blob partitions provide an additional layer of interpretability by automatically grouping residues into functional substructures, with blobs containing annotated active-site residues showing 1.85× higher importance than other blobs (=0.339, p=0.009), without any active-site supervision. Our framework requires no retraining of the language model, adds only 1.1M parameters, \& generalises across ProteinShake tasks, achieving F of 0.733 on Gene Ontology prediction \& AUROC of 0.969 on binding-site detection. We position this as an interpretable structural companion to protein language models that makes their predictions more transparent \& auditable.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…