PAIR-Former: Budgeted Relational Multi-Instance Learning for Functional miRNA Target Prediction

Abstract

Functional miRNA--mRNA targeting is a large-bag prediction problem where each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. Prior methods use max-pooling over individual CTS scores, ignoring relational patterns among sites, but modeling these patterns is critical for accuracy. The challenge is that naive relational aggregation incurs O(n2) cost, prohibitive when n reaches thousands, yet a cheap scan alone discards the very interactions that drive functional repression. We formalize this tension as Budgeted Relational Multi-Instance Learning (BR-MIL), a new MIL problem where the compute budget K is a first-class constraint such that at most K instances per bag may receive expensive encoding and relational processing. We establish theoretical foundations for BR-MIL, proving that both approximation quality and generalization are governed by K rather than the raw bag size n. Building on this theory, we propose PAIR-Former, which scans all candidates cheaply, selects K diverse CTSs, and aggregates them via Set Transformer. PAIR-Former achieves state-of-the-art performance, outperforming all reproduced baselines with F1=0.840 on miRAW (10-fold balanced CV) and 0.839 on deepTargetPro in transfer evaluation, while achieving 0.793 on the large-scale MTI benchmark (420K pairs, 38× larger), demonstrating that budgeted relational MIL scales where naive approaches fail. Additional results on CAMELYON16 and Musk2 further show that the proposed BR-MIL formulation extends beyond biological sequence modeling.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…