CoHSI I; Detailed properties of the Canonical Distribution for Discrete Systems such as the Proteome
Abstract
The CoHSI (Conservation of Hartley-Shannon Information) distribution is at the heart of a wide-class of discrete systems, defining the length distribution of their components amongst other global properties. Discrete systems such as the known proteome where components are proteins, computer software, where components are functions and texts where components are books, are all known to fit this distribution accurately. In this short paper, we explore its solution and its resulting properties and lay the foundation for a series of papers which will demonstrate amongst other things, why the average length of components is so highly conserved and why long components occur so frequently in these systems. These properties are not amenable to local arguments such as natural selection in the case of the proteome or human volition in the case of computer software, and indeed turn out to be inevitable global properties of discrete systems devolving directly from CoHSI and shared by all. We will illustrate this using examples from the Uniprot protein database as a prelude to subsequent studies.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.