Beware of so-called 'good' correlations: a statistical reality check on individual mRNA-protein predictions
Abstract
Research in the life sciences often employs messenger ribonucleic acids (mRNA) quantification as a standalone approach for functional analysis. However, although the correlation between the measured levels of mRNA and proteins is positive, correlation coefficients observed empirically are incomplete, necessitating caution in making agnostic inferences. This essay provides a statistical reflection and caveat on the concept of correlation strength in the context of transcriptomics-proteomics studies. It highlights the variability in possible protein levels at given empirical correlation values, even for precise mRNA amount, and underscores the notable proportion of mRNA-protein pairs with abundances at opposite ends of their respective distributions. Cell biologists, data scientists, and biostatisticians should recognise that mRNA-protein correlation alone is insufficient to justify using a single mRNA quantification to infer the amount or function of its corresponding protein.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.