Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: Removal of sequencing adapter contamination improves microbial genome databases

Fig. 2

Adapter contamination is concentrated at the beginnings and ends of contigs, and its removal improves assembly contiguousness. a Histogram shows the concentration of Illumina universal adapter sequences near the extremities of contigs in the genomes showing significant evidence of adapter contamination (p-value < 0.01). Mean distances in bases from beginnings or ends of contigs were calculated for adapter sequences and reverse complements of adapter sequences, respectively. b DNA sequences show five examples of contamination by Illumina adapters (red sequences) at the ends of contigs (grey squares) in Paenibacillus lactis assembly MGYG000003402 from the human gut. c DNA sequences show five examples of contamination by the reverse complement of Illumina adapters (red sequences) at the beginnings of contigs in assembly MGYG000003402. In (b) and (c) blue and yellow sequences correspond to forward- and reverse-specific adapter sequences, respectively, adjacent to the universal adapter sequence. d Barplot shows for each database the per-assembly average number of contigs merged with other contigs after the removal of adapter contamination and reassembly (of the 1110 contaminated assemblies at p-value < 0.01). e Scatterplot shows the positive relationship between the number of adapter sequences present in assemblies showing the strongest evidence of contamination (FDR-corrected p-value < 1e-16) (x-axis) and the number of contigs that were able to be merged by reassembly after adapter contamination removal (y-axis). Red line shows best-fit regression of log transformed values (transformation was made to reduce heteroscedasticity). The p-value was calculated from a generalized linear model with Poisson-distributed errors for count data

Back to article page