LSMTCR: A Scalable Multi-Architecture Model for Epitope-Specific T Cell Receptor de novo Design
Abstract
Designing full-length, epitope-specific TCR αeta remains challenging due to vast sequence space, data biases and incomplete modeling of immunogenetic constraints. We present LSMTCR, a scalable multi-architecture framework that separates specificity from constraint learning to enable de novo, epitope-conditioned generation of paired, full-length TCRs. A diffusion-enhanced BERT encoder learns time-conditioned epitope representations; conditional GPT decoders, pretrained on CDR3eta and transferred to CDR3α, generate chain-specific CDR3s under cross-modal conditioning with temperature-controlled diversity; and a gene-aware Transformer assembles complete α/eta sequences by predicting V/J usage to ensure immunogenetic fidelity. Across GLIPH, TEP, MIRA, McPAS and our curated dataset, LSMTCR achieves higher predicted binding than baselines on most datasets, more faithfully recovers positional and length grammars, and delivers superior, temperature-tunable diversity. For α-chain generation, transfer learning improves predicted binding, length realism and diversity over representative methods. Full-length assembly from known or de novo CDR3s preserves k-mer spectra, yields low edit distances to references, and, in paired α/eta co-modelling with epitope, attains higher pTM/ipTM than single-chain settings. LSMTCR outputs diverse, gene-contextualized, full-length TCR designs from epitope input alone, enabling high-throughput screening and iterative optimization.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.