Single-base mismatch profiles for NGS samples

Abstract

Within the preprocessing pipeline of a Next Generation Sequencing sample, its set of Single-Base Mismatches is one of the first outcomes, together with the number of correctly aligned reads. The union of these two sets provides a 4x4 matrix (called Single Base Indicator, SBI in what follows) representing a blueprint of the sample and its preprocessing ingredients such as the sequencer, the alignment software, the pipeline parameters. In this note we show that, under the same technological conditions, there is a strong relation between the SBI and the biological nature of the sample. To reach this goal we need to introduce a similarity measure between SBIs: we also show how two measures commonly used in machine learning can be of help in this context.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…