Skip to main content

Table 1 Number and types of errors detected by Kastor

From: Kastor: a reference-based comparative approach for assessment and correction of gene-fragmenting errors in long-read assemblies of small genomes

Organism

Reference set1

Detected Errors

Total

Insertions

Deletions

Substitution2

Pseudomonas syringae508

Set 1

2900

900

1708

900

Set 2

657

36

263

358

Pseudomonas syringae508 (DFM)3

Set 1

1674

187

559

928

Set 2

1066

28

98

940

Pseudomonas syringae642

Set 1

195

71

124

-

Citrobacter koseriMINF_9D

Genus

951

225

209

517

Species

97

9

75

13

Mycobacterium tuberculosis

Species

18

3

3

12

Plasmodium falciparumW2

Species

40,681

20

40,661

-

  1. 1 – Reference data set used for Kastor correction depending on taxonomic closeness to the target assembly (i.e., reference genomes from Citrobacter species were included at the genus level, but only Citrobacter koseri genomes at the species level. See Supplementary File S4 for Reference set composition
  2. 2 – Substitutions were excluded when raw reads were unavailable
  3. 3 – basecalled using Dorado and assembled using Flye and Medaka (see Methods)