Selection originating from protein foldability: I. A new method to estimate selection temperature
Abstract
The probability distribution of sequences with maximum entropy that satisfies a given amino acid composition at each site and a given pairwise amino acid frequency at each site pair is a Boltzmann distribution with (-N), where the total interaction N is represented as the sum of one body and pairwise interactions. A protein folding theory based on the random energy model (REM) indicates that the equilibrium ensemble of natural protein sequences is a canonical ensemble characterized by (- GND/kB Ts) or by (- GN/kB Ts) if an amino acid composition is kept constant, meaning N = GND/kB Ts + constant, where GND GN - GD, GN and GD are the native and denatured free energies, and Ts is the effective temperature of natural selection. Here, we examine interaction changes ( N) due to single nucleotide nonsynonymous mutations, and have found that the variance of their N over all sites hardly depends on the N of each homologous sequence, indicating that the variance of GN (= kB Ts N) is nearly constant irrespective of protein families. As a result, Ts is estimated from the ratio of the variance of N to that of a reference protein, which is determined by a direct comparison between ND ( N) and experimental GND. Based on the REM, glass transition temperature Tg and GND are estimated from Ts and experimental melting temperatures (Tm) for 14 protein domains. The estimates of GND agree well with their experimental values for 5 proteins, and those of Ts and Tg are all within a reasonable range. This method is coarse-grained but much simpler in estimating Ts, Tg and GND than previous methods.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.