Fig. 1

Reduction of fragmentation of identified genes for (A) Pss508 long-read assembly, and (B) Psy642 short-read Illumina assembly, before and after Kastor correction. RefSeq proteins used in PGAP’s protein homology annotation of pre-correction assemblies were mapped back to each assembly using tblastn and plotted as percent sequence similarity vs. percent sequence coverage. Corrected genes are colored according to the original annotations while non-corrected genes are represented as grey. In detail, fragmented indicate genes where two or more frameshifted gene fragments were found; complete indicates complete genes; incomplete indicate truncated genes at the 5’ or 3’ end; and internal stop indicates genes containing a stop codon truncating the full gene. Gene fragmentation or incompleteness is inferred from lower query coverage when tblastn detects only a fragment of expected gene. Longer and more similar genes were found after correction