Y-Trim: Evidence-gated Adaptase tail trimming for single-stranded bisulfite sequencing
Abstract
Background: Single-stranded whole-genome bisulfite sequencing (ssWGBS) enables DNA methylation profiling in low-input and highly fragmented material, including cell-free DNA. In widely used post-bisulfite protocols, Adaptase-mediated tailing adds stochastic, template-free end sequence. Unlike adapter-defined junctions, these tails lack a fixed sequence template, so trimming must be decided from FASTQ-stage observables under intrinsic uncertainty. Results: We show that bisulfite-induced compositional degeneracy implies a strictly positive error floor for any fixed per-read boundary rule under a finite nucleotide alphabet. Guided by this limit, we introduce Y-Trim, an evidence-gated framework that separates admission (should we trim) from inference (where to trim). For Read 2, Y-Trim performs per-read adaptive cut placement via a fixed, chemistry-typed matrix-linear texture scoring scheme; for Read 1, it uses automated sample-level anchoring when read-level localization is feasibility-limited. Across modules, Y-Trim is an explicit, chemistry-specific decision rule with interpretable operating points. On a curated 34-run public cohort (CCGB-34) and simulator stress tests with known latent boundaries, Y-Trim exhibits stable Read 2 operating behavior and Read 1 feasibility-limited behavior consistent with conditional read-through. Conclusions: Template-free Adaptase tail trimming is best viewed as an evidence-limited FASTQ-stage decision rather than a generic preprocessing knob. By making admissibility and abstention explicit and exposing interpretable genomic-retention versus residual-carryover trade-offs, Y-Trim provides a practical uncertainty-aware preprocessing strategy for ssWGBS.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.