Skip to main content

Leveraging genomic insights from the neglected malaria parasites P. malariae and P. ovale using selective whole genome amplification (SWGA) approach

Abstract

Background

Systematic genomics-guided population-based studies on the neglected malaria parasites, P. malariae, P. ovale curtisi, and P. ovale wallikeri species remain challenging due to their low parasitemia, underestimation, and lack of comprehensive genetic analysis. Techniques that cost-effectively allow the enriching of the genome of interest from complex genome backgrounds for sequencing studies help immensely to perform genomic analyses. One such technique is selective whole-genome amplification (SWGA).

Results

We applied SWGA using specifically designed primer sets targeting the pathogen genome to enrich parasite DNA from clinical samples. This enabled cost-effective and high-quality WGS for these neglected malaria species. WGS on SWGA-treated samples demonstrated improved genome coverage. Our method outperformed the published protocol for P.malariae with higher enrichment of the targeted genome. On average, P. malariae had 93% of the genome covered by ≥ 10 reads; parallel improvements in genome coverage were achieved for both P. ovale spp. with 81% on average of the genome covered by ≥ 10 reads. Consequently, the detection of thousands of additional SNPs not detectable in pre-SWGA samples was facilitated after SWGA, allowing more substantial downstream population genomics analysis, particularly for the polymorphic and antimalarial genes of great interest for all the species. Furthermore, leveraging the long DNA fragments generated by SWGA, we achieved high-quality genome assemblies for P. malariae and P. ovale using PacBio long reads sequencing technology.

Conclusions

SWGA approach implemented here provides a powerful tool for enhancing genomic analysis of these neglected parasites, revealing population diversity, drug resistance markers, and hypervariable regions. This methodology constitutes a transformative tool to surmount the challenges of genomic analysis for neglected malaria parasites and can improve malaria research and control.

Peer Review reports

Background

Malaria infection is caused by Plasmodium spp. parasites in tropical and subtropical countries. Plasmodium falciparum is responsible for the majority of malaria infections and disease severity, particularly within Africa. In contrast, Plasmodium malariae, Plasmodium ovale curtisi, and P. ovale wallikeri species are recognized as neglected malaria parasites due to their low prevalence and lower disease severity. However, recent reports reveal that the prevalence of P. malariae and P. ovale spp. is rising [1]. Indeed, an increase in symptomatic and asymptomatic infection cases due to P. malariae and P. ovale spp in sub-Saharan Africa and South America has led to some concerns for elimination programs [1].

P. malariae is known as “the most neglected” human malaria-causing parasite, and this species is the cause of both quartan fever and quartan malaria [2]. First observed by Camillo Golgi in 1886 [3], P. malariae is the only Plasmodium species with a 72-hour developmental cycle compared to the conventional 48-hour cycle of the other human malaria parasites. The preference to invade old red blood cells (RBCs) and the low production of merozoites during the Intra-erythrocytic developmental cycle partially explain the longer erythrocytic life cycle that, consequently, leads the host to develop immunity to combat the infection [4]. The worldwide distribution of P. malariae is primarily tropical and sub-tropical and often coincides with P. falciparum’s geo-localization, particularly in sub-Saharan Africa. Furthermore, P. malariae is frequently identified in mixed infections with P. falciparum in areas endemic to the latter [5]. Interestingly, malaria episodes can occur even after 30–50 years following a previous P. malariae infection [4].

On the other hand, P. ovale triggers tertian malaria, a less common form of malaria infection [6]. P.ovale spp. include P. ovale curtisi and P. ovale wallikeri [7]. These have been recognized as two distinct species instead of subspecies [8]; and they have even recently been renamed Plasmodium ovalecurtisi and Plasmodium ovalewallikeri [9]. P. ovale generally causes low parasitemia infections that are rare compared to P. falciparum and P. vivax, although it has started to spread in recent years [10, 11]. P. ovale species are primarily found in sub-Saharan Africa but are also present on the islands of the Western Pacific and, sporadically, in Southeast Asia and India [8]. Mixed infections frequently occur with other malaria parasite species, most markedly with P. falciparum [5]. Like P. vivax, P. ovale is restricted to reticulocytes and can produce hypnozoites. These dormant liver stages can trigger relapse months or even years after the primary exposure to sporozoites [8].

The mechanisms of P. malariae and P. ovale pathogenicity and relapses still need to be discovered. The typical low parasitemia and the frequent co-infections make it difficult to properly diagnose and identify both P. malariae and P. ovale spp. by light microscopy, leading to their underestimation [12, 13]. Moreover, issues relating to correctly detecting and identifying all the parasite species potentially correlate with the low number of reported cases. Thus, proper molecular tools are urgently required to study their transmission, chronicity of infections, and genetic diversity.

The first step towards malaria elimination is accurately diagnosing and distinguishing the species and their intrinsic molecular features. In that sense, the introduction of Whole Genome Sequencing (WGS) has been a great aid. WGS allowed for the identification of Single Nucleotide Polymorphisms (SNPs) and insertions/deletions (InDels) across P. falciparum, P. vivax, and P. knowlesi [14,15,16,17]. These WGS studies have enhanced a better understanding of their diversity, population structure, transmission patterns, and drug resistance profiles [18,19,20,21]. Nonetheless, sufficient DNA from the infecting species is necessary for performing WGS. This is a limiting factor in parasite genomics because Plasmodium genomes of interest represent only a tiny fraction of the total isolated DNA. This is especially the case for P. malariae and P. ovale, where low parasite densities and the absence of an ex-vivo culture lead to a low yield of parasite DNA [4, 6]. Thus, generating enough sequencing data to achieve reasonable genome coverage for further molecular characterization of the parasite genome of interest would be expensive because of the sequencing cost.

Selective Whole Genome Amplification (SWGA) is a PCR-based technology that was established to enrich the pathogen genome (foreground) from the host genome (background genome) to overcome these obstacles in a cost-effective manner [22]. The method uses phi29 DNA polymerase combined with specific primer sets designed to amplify a target genome [23]. In malaria, this technique proved to be successful when applied to P. falciparum [24, 25], P. vivax [26], and P. knowlesi [27] from both non-filtered blood and dried blood spots of clinical samples [27, 28]. Nonetheless, while efforts have mainly concentrated on P. falciparum, all species are emerging in importance and need to be addressed.

Lately, the SWGA approach has been applied to P. malariae and P. ovale spp, enabling further analysis and highlighting the complexity and diversity of this species across different geographic regions [29,30,31]. These studies shed light on the need for more comprehensive population genomics to understand transmission dynamics and drug resistance, especially as P. malariae shows increasing prevalence in endemic regions. Our work expands on these findings by using selective whole genome amplification (SWGA) to enrich P. malariae and P. ovale genomes, allowing for a deeper exploration of their genomic diversity and structure. Applying SWGA for the neglected Plasmodium species can help enhance malaria control by providing critical insights into the genetic diversity of parasites, drug resistance patterns, and transmission dynamics. It aids in early detection of resistance, supports vaccine development, improves diagnostic tools, and enables more targeted interventions. While not directly reducing malaria, SWGA informs strategies that improve treatment effectiveness and control efforts, ultimately helping to facilitate the downstream analyses and eventually contribute to reducing the malaria burden [32]. Furthermore, the availability of good-quality genomes [33, 34] enables and facilitates this strategy since the primer design is permitted.

We have set out to develop and optimize an improved SWGA approach to design specific primer sets targeting P. malariae, P. ovale curtisi, and P. ovale wallikeri. We tested individual primer sets on DNA samples extracted from unprocessed whole blood from infected individuals for successful DNA enrichment and genome coverage. Here, we report an effective, high-quality WGS following SWGA application on the three neglected species with even higher performance for P. malariae than previously reported [29]. We achieved high genome coverage, allowing the identification of a significantly higher number of SNPs and thus making further downstream genomic analyses on the genetic diversity of these neglected parasites feasible. Moreover, we used long sequencing reads obtained from SWGA genetic material to perform genome assemblies as a proof of concept in both species. Our optimized protocols will permit large-scale population genomics-driven studies and fill a much-needed gap in our knowledge of these three neglected malaria parasites.

Results

P. malariae SWGA followed by WGS.

To perform SWGA on P. malariae-infected whole blood samples, we designed primers that specifically amplify parasite DNA using the SWGA software, as previously published for P. falciparum [24]. Several P. malariae-specific primer sets were generated using various options in the SWGA toolkit [23] to obtain the best optimal primers for efficient and even amplification of the target DNA of interest. These parameters include the frequency of DNA motifs across the genome, the size of the motifs (Kmer), and the Tm of the primers (see methods section for parameters used). For a trial run, five primer sets based on best fg_dist_gini, score, fg_dist_mean, and scoring_fn scores were chosen and tested on the extracted DNAs from unprocessed whole blood samples of P. malariae-infected individuals working in sub-Saharan African countries. All five primer sets could amplify P. malariae DNA specifically from the human genome background. However, set-1, with the lowest fg_dist_gini score (0.5182), performed best (Fig. S1 and Table S1) and was selected to use with all the samples, as it had even binding across the genome (Gini coefficient measures the evenness in a primer dataset, ). For all the SWGA reactions, primer set-1 resulted in a high increase in DNA amount and a good DNA integrity profile, as shown by Fragment Analyzer profiles of each sample pre- and post-SWGA (Additional File 1).

After determining that primer set-1 performed best among all primer sets tested, we used it on six P. malariae DNA samples (PM2, PM3, PM4, PM5, PM7, and PM8, see Additional File 2) and performed WGS on both pre-SWGA and post-SWGA DNAs. The percentage of mapped reads pre- and post-SWGA showed that we could enrich the proportion of reads mapped to the P. malariae genome for all samples (Fig. S2a). For instance, we observed an increase from 3.01 to 76.78% with a fold enrichment of 25.4X for sample PM8, and an increase from 0.03 to 76.48%, representing a considerable fold enrichment of 2094X for sample PM5 (Fig. 1a, Fig. S2a). The pre-and-post-SWGA comparative results of the per-base coverage analysis and genome coverage further complement our observations that primer set-1 had efficiently amplified the P. malariae target genome (Fig. 2a; Table 1). After SWGA, we achieved a superior mean genome coverage (average post-SWGA: 331.16 ± 84.9 X, pre-SWGA: 6.89 ± 4.1 X), allowing a higher percentage of callable regions (region being covered by > 10 reads) (Table 1). On average, after SWGA, 93.2 ± 1.6% of the genome is callable, while only 15.7 and 20.9% were before SWGA (Table 1). Moreover, the read coverage after SWGA was evenly distributed across all 14 chromosomes, except for the sub-telomeres, where low sequence complexity prevented better mapping (Fig. 2b and Fig. S3).

Fig. 1
figure 1

Enrichment of P. malariae (a), P. ovale curtisi (b), and P. ovale wallikeri (c) genome sequences post SWGA. Fold enrichment for each sample was calculated by dividing the percentage of reads mapped to the corresponding genome before SWGA to post SWGA. Abbreviation: PM, P. malariae; POC, P. ovale curtisi; POW, P. ovale wallikeri

Fig. 2
figure 2

SWGA performance of P. malariae and P. ovale curtisi (a) Line plot shows the percentage of genome covered with x-depth coverage for all P. malariae samples before and after SWGA. (b) Read depth coverage of pre-SWGA and post-SWGA for chromosome 2 and chromosome 12 (chosen as representative) to represent sample PM3. The number on the right shows the data range as read coverage. (c) Line plot shows the percentage of genome covered with x-depth coverage for all P. ovale curtisi samples before and after SWGA. (d) Read depth coverage of pre-SWGA and post-SWGA for chromosome 2 and chromosome 12 as representatives shown for sample POC6. The number on the right shows the data range as read coverage

Table 1 Sequencing statistics for all P. malariae isolates before and after SWGA

The SWGA performance and primer sequences of our set-1 were then compared with the previously reported primer sets targeting P. malariae [29] using the same DNA material. The pairwise comparison of primer sequences using a similarity matrix shows the uniqueness of SWGA primers from this study compared to the SWGA primers already published for P. malariae [29] (Fig. S4a). Moreover, the performance comparison, on the same P. malariae samples showed that after the SWGA application, we obtained a much better percentage of total reads mapped to P. malariae with primer set-1 (Fig. S5a). We compared the percentage of genome covered at 10X-depth coverage, and the results showed an equivalent outcome for both primer sets except for the sample PM4, which was less efficient for the previously published primer set (Fig. S5b).

P. ovale curtisi and P. ovale wallikeri SWGA, followed by WGS.

As P. ovale curtisi and P. ovale wallikeri have remarkably similar genomes, we utilized P. ovale curtisi as a reference genome. Hence, the same generated primer sets were used to perform SWGA on both parasites using extracted DNAs from unprocessed infected whole blood samples. Following the same strategy, we ran a trial with five selected primer sets with the best mean foreground genome distance score between the primers (Table S2). Out of five tested primer sets, set-4 performed best (Fig. S6). The best-performing primer set 4 was then used to selectively amplify P. ovale curtisi and P. ovale wallikeri samples (referred to as POC and POW samples; see Additional File 2). Similarly, the primer set-4 increased the amount of DNA after the SWGA reaction. However, the efficiency of the reaction was directly correlated to the initial integrity of the DNA samples. The sample POW4 displayed lower DNA integrity and, hence, a lower SWGA performance, enforcing the importance of having good initial integrity and quality for the input DNA. As a result, we have decided to remove it from further analysis. The Fragment Analyzer presenting DNA integrity before and after the SWGA reaction is shown in Additional File 1.

The comparison of WGS data before and after SWGA revealed an increase in the number of reads mapping to both P. ovale spp across all SWGA samples having good DNA integrity. The results from the mapped read statistics indicated that both parasite genomes could be enriched with the same primers set-4. For instance, there was an increase from 0.41 to 21.76% (53.07 X fold enrichment) and from 0.6 to 30.48% (55.98 X fold enrichment) for POC3 and POW5 samples, respectively (Fig. 1b and c, Fig. S2b and S2c). On average, we obtained 25-fold (25X) enrichment after using SWGA for both P. ovale spp. As for the coverage analysis, we obtained a substantial increase in the mean coverage of SWGA samples compared to non-SWGA samples, indicating the successful amplification of our P. ovale SWGA primers. The mean coverage for P. ovale curtisi, on average, was 32.53 ± 13.7 X after SWGA compared to 1.5 ± 0.9 before SWGA (Fig. 2c; Table 2). For P. ovale wallikeri, the SWGA enabled higher genome coverage as well: the mean genome coverage was 56.56 ± 31.6 X after SWGA, while it was 2.40 ± 1.02 X without SWGA (Fig. S7a, Table 3).

Table 2 Sequencing statistics for P. ovale curtisi isolates before and after SWGA
Table 3 Sequencing statistics for all P. ovale wallikeri isolates before and after SWGA

Higher genome coverage for post-SWGA samples allowed a significant increase in the percentages of callable regions (covered by > 10 reads). Post-SWGA of P. ovale curtisi samples reached 80.65 ± 6.4% on average, while pre-SWGA samples had 0.40 ± 0.3% (Table 2). P. ovale wallikeri had a value of 72.73 ± 30.2% with SWGA, whereas without SWGA, the percentage of callable regions (> 10X genome coverage) was 0.94% ± 0.8% (Table 3). Moreover, the chromosome coverage plots for both P. ovale species showed an enhanced read-depth in the core genome compared to the telomeric regions, producing an even distribution post-SWGA when comparing to non-SWGA samples as shown for chromosomes 2 and 12 for POC6 (Fig. 2d, Fig. S8) and POW6 samples (Fig. S7b).

Similarly to P. malariae, we compared our in-house-designed primer sequences with the previously reported primer sets targeting P. ovale spp [30, 31]. The pairwise comparison of the primers using a primer similarity matrix revealed the uniqueness of our P. ovale SWGA primers (Fig. S4b). However, when we compared the performance of our in-house primer set with previously reported primer sets targeting P. ovale spp [30], we found that the published primer set by Joste et al. performed better in terms of enrichment of P. ovale spp target genome (Fig. S9).

Variants analysis

To assess our SWGA performance, we called high-quality SNPs from the total genome for all the samples that underwent SWGA and compared them to non-SWGA samples using GATK best practice workflows [35, 36]. We aimed to see how SWGA improved the rate of SNP calling. As anticipated, for all three parasite species, we detected a large number of additional SNPs, for which, in non-SWGA samples, the coverage was too low to reliably call any SNP. We gained up to 75,355 additional SNPs for P. malariae (Table 4), up to 49,704 additional SNPs for P. ovale curtisi (Table 5), and up to 53,564 additional SNPs for P. ovale wallikeri (Table 6). Moreover, it is notable that the proportion of detected SNPs in some samples was very low pre-SWGA, and the SNP discovery in those samples was greatly facilitated post-SWGA. This was particularly true for P. ovale spp., as for some pre-SWGA samples, we could reliably call very few SNPs. For the POC7 sample, we went from 14 SNPs pre-SWGA to 30,760 post-SWGA (Table 5). Consequently, for all three species, SWGA enabled the identification of numerous additional SNPs that could have been achievable in the non-SWGA sample with a similar overall level of total read depth compared to SWGA samples (Tables 4, 5 and 6).

Table 4 Identification of the additional high number of high-quality SNPs in P. malariae due to increasing genome coverage post-SWGA
Table 5 Identification of the additional high number of high-quality SNPs in P. ovale curtisi due to increasing genome coverage post-SWGA
Table 6 Identification of the additional high number of high-quality SNPs in P. ovale wallikeri due to increasing genome coverage post-SWGA

Core genome definition and phylogeny

We analyzed the SNP density across the 14 chromosomes for the three species to identify the core genome and hypervariable regions. The core genome excluded hypervariable genes predominantly located in the sub-telomeric regions of all chromosomes, such as pir, stp1, fam-l, and fam-m for P. malariae and pir and stp1 for P. ovale spp. Accordingly, the core genome positions, as well as the positions of the hypervariable regions (Table S3−5), were defined for P. malariae, P. ovale curtisi, and P. ovale wallikeri samples from this study (Fig. S10).

We compared the core genome as defined by Ibrahim et al. (2020) and in this study. While the positions defined for the 3’ end of the core genome on each chromosome are almost identical, there is a notable difference in the genome positions defined for the start of the core genome (5’ end).

In this study, the start of the core genome is positioned slightly further away from the start of the chromosome for the majority of chromosomes. This difference is likely attributable to the higher sequencing depth achieved in this study compared to Ibrahim et al. (2020). The increased sequencing depth for the P. malariae samples in our study enabled the identification of more variable regions (in terms of SNPs) at the start of each chromosome, thereby pushing the start of the core genome further downstream.

We used previously reported core genomes (see methods section) and the defined SNPs from the core genome regions in this study to construct the maximum likelihood phylogenomic tree of P. malariae cases isolated across continents (Fig. 3), P. ovale curtisi, and P. ovale wallikeri (Fig. S11). The P. malariae samples from Thailand cluster together and distinctly separate from the samples from Africa, including those from this study. Moreover, we can see that neighboring African countries, such as Sierra Leone and Liberia, often cluster together (Fig. 3). Interestingly, while some P. malariae isolates exhibit clustering by country, others do not. This observation aligns with recent findings from Ibrahim et al. (2024), where even with more than a hundred isolates sequenced, P. malariae did not consistently cluster at the country level [37]. This variability may reflect unique aspects of P. malariae biology, such as its potential for broader genetic exchange or diverse population dynamics across regions. Similarly, we can observe that most of the P. ovale curtisi and P. ovale wallikeri samples from Equatorial Guinea cluster together (Fig. S11). The samples’ geographical and regional clustering reveals how the parasites’ genetic diversity and population characteristics can help improve genomic studies, particularly if the number of samples is not low. Moreover, these results endorse how SWGA would improve refining and defining core genomes for more extensive comparative genomic diversity studies.

Fig. 3
figure 3

Maximum likelihood phylogenetic tree of P. malariae Maximum likelihood phylogeny unrooted tree was generated using the web iqtree from a total of 77, 247 core genome SNPs from 27 samples were used. Bootstrap support values from 1000 replicates are shown. The parameters used for tree construction using IQTREE are ModelFinder + tree reconstruction + ultrafast bootstrap (1000 replicates): “-m MFP -mtree -b 1000. The ModelFinder identified TVM + F + R2 as the best-fitting model. The tree was visualized using iTOL

SNP discovery in highly polymorphic and drug resistance genes

The search for genetic variation in polymorphic and common resistance marker genes is decisive for better health response control. Hence, increasing and improving SNP detectability in these genes is indispensable. SWGA showed powerful results by revealing thousands of additional SNPs not detected in pre-SWGA samples (Additional File 4–6). However, verifying that the PCR-based SWGA protocol did not introduce any SNP not present in the original samples subjected to SWGA was essential. Therefore, we performed PCRs targeting a few regions of chosen polymorphic and drug resistance-associated genes (msp1, ama1, mdr1, dhfr-ts) on the samples before and after SWGA for P. malariae and both P. ovale spp., and sequenced the PCR products by Sanger sequencing. The results indicated that the SWGA did not introduce any bias. The Sanger analysis confirmed that the additional SNPs found post-SWGA were initially present pre-SWGA, validating the SWGA protocol and its non-biased strengths (Table S6). As a result, with a similar level of total read output, the SWGA approach has the potential to provide valuable supplementary data for a more comprehensive analysis of SNP profiles.

Numerous non-synonymous mutations were found for the polymorphic genes Pmmsp1 and Pmama-1 (Table S6). Furthermore, to illustrate the augmentation of SNP number after SWGA, we chose Pmmsp1 as a prominent example of the power of the SWGA method. MSP1 is a key surface protein found on the merozoite stage of Plasmodium, playing a vital role in the initial binding and invasion of the host’s red blood cells. The gene encoding MSP1 is highly polymorphic, making it important to understand how the parasite penetrates and survives within red blood cells. The detection of SNPs within genes like msp1 is crutial, as changes in these SNPs may influence the parasite’s interaction with the host, potentially altering protein function, affecting antigenicity, and modifying the parasite’s capacity to attach to red blood cells. The use of SWGA can confidently help call more SNPs for which we did not have enough depth to call before SWGA. Noticeably, we observed an enrichment of the number of SNP and the resulting amino acid changes after SWGA, as expected (Fig. 4).

Fig. 4
figure 4

SNPs and the resulting amino acid changes identified in P. malariae merozoite surface protein 1 (msp1) gene locus before (a) and after SWGA (b) The Y-axis shows how many samples Sanger sequencing validated that mutation. The number inside the circle represents how many samples post-SWGA detected the same SNP

Similarly, we noted that SWGA revealed non-synonymous mutations in key resistance markers: dhfr and mdr1. Specifically, we observed amino acid changes in P. malariae dhfr (N50Y, F57L, R58S, N114E, N114D) and P. malariae mdr1 (V373I, S463N, N610K), as well as in P. ovale curtisi dhfr (C14F, S58R, H98P, S113N) and P. ovale curtisi mdr1 (V243I) (Table S6). Several of these SNPs have been identified in studies investigating potential markers of resistance or reduced susceptibility to specific antimalarial drugs. For instance, mutations in P. malariae dhfr at positions N50Y, F57L, and N114E/N114D have been observed in regions where antifolate drugs like pyrimethamine and sulfadoxine are widely used. Recent evidence from Ibrahim et al. (2024) indicates that these mutations, particularly F57L, significantly reduce susceptibility to pyrimethamine. Mutations in P. malariae mdr1, particularly N610K, have been linked to reduced susceptibility to chloroquine and other quinoline drugs [37]. In P. ovale curtisi dhfr, the S113N and H98P mutations have been correlated with decreased efficacy of pyrimethamine and other antifolates. The V243I mutation in P. ovale curtisi mdr1 may affect the response to chloroquine and related drugs [29, 30, 38,39,40,41,42].

Genome assembly

The SWGA relies on using Phi29 DNA polymerase that can generate very long fragments that could reach 46 Kpb, as shown in the Fragment Analyzer analysis (Additional File 1). We used P. malariae PM7 and P. ovale curtisi POC2 post-SWGA PCR products for PacBio long-read sequencing and performed genome assemblies. The results show that the long SWGA PCR products permitted obtaining long reads on the PacBio platform, thus enabling us to produce good-quality assemblies. We observed similar results for both P. malariae and P. ovale curtisi. compared to their respective reference genomes, as shown in Tables 7 and 8 and Tables S7-8. The read-length distribution of raw and clean reads showed similar results for PM7 (Fig. S12) and POC2 (Fig. S13), with around 35% of the reads being 10–20 kb long. The Circos plots display coverage and sequencing depth across all 14 chromosomes, with peaks likely corresponding to SWGA primer binding sites for both species (Fig. 5 and Fig. S14). More importantly, it is relevant to note that long-read sequencing allows access to the telomeric regions for further sequence analysis, which is usually challenging with short-read sequencing. Here, the PM7 SWGA assembly covered 251 fam-l (289 in the reference genome) and 180 fam-m (194 in the reference genome). Moreover, the collinear blocks between the reference genomes and the SWGA assemblies present a good correlation for both P. malariae PM7 and P. ovale curtisi POC2 samples (Fig. 6 and Fig. S15). Therefore, obtaining these assemblies has proven the excellent performance of the SWGA approach to improving malaria genetic analysis by allowing the use of the long DNA fragments that were obtained.

Table 7 Mapping rate and coverage of P. malariae PM7 after SWGA from a PacBio Sequel II long-read sequencing data
Table 8 Genome statistics of P. malariae PM7 assembly after SWGA from a PacBio Sequel II long-read sequencing data. De novo P. malariae SWGA is the draft assembly based on Pacbio SWGA reads. Reference-based P. malariae SWGA is the final assembly performed using referenced-based scaffolding on draft assembly
Fig. 5
figure 5

Circos plot displaying the sequencing depth and coverage of PM7 SWGA from a PacBio Sequel II long-read sequencing data. The outer circle is the P. malariae UG01 genome. The inner circles are genome GC content, SWGA data coverage, and sequencing depth

Fig. 6
figure 6

Collinear blocks assembled contigs between the P. malariae (PmalariaeUG01) and the PM7 derived from SWGA- DNA material sequenced on a PacBio Sequel II. Each color represents the best match between the two genomes. PmUG1–14 represents chromosomes 1–14 of P. malariae (PmalariaeUG01), and PmSWGA1-14 represents chromosomes 1–14 of the SWGA assembly

Discussion

The importance of the SWGA methodology relies on its broad applicability. It can conceivably enrich any target DNA from the contaminating DNA background [22]. The SWGA approach is helpful for various applications in microbial genomics, evolution, population genetics, and epidemiological studies. The technology enables greater molecular characterization of the target organism, preventing the necessity for mechanical separation from the background genome and in-vitro culturing, which is convenient for the case of microorganisms that we are unable to culture in vitro [22]. In the field of malaria, SWGA proved to be successful without the need for large amounts of blood for P. falciparum [24, 25], P. vivax [26], P. knowlesi [27], P. malariae [29], and P. ovale spp [30, 31] to perform population genomics studies using clinical samples from malaria-positive patients. Here, we successfully applied the SWGA strategy to specifically amplify DNA from unprocessed blood from clinical samples of P. malariae and P. ovale spp., enabling genomic analysis of samples that would have been extremely challenging to explore because of the reduced usability of the WGS results of the non-SWGA in comparison to their respective SWGA samples.

In this instance, SWGA samples yielded higher genome coverage, enabling downstream analysis. We achieved high-quality WGS data with 93% on average for P. malariae and 81% for P. ovale spp. of their genomes covered by ≥ 10 reads, demonstrating a good performance of our primers. The performance of P. malariae SWGA was even better than that of the previously reported one in terms of ability to enrich the target genome [29]. Consistent with previous malaria SWGA results, we observed good read depth across all 14 chromosomes [24, 25]. The coverage plots showed an excellent read depth in the core genome compared to the telomeric regions, producing an even distribution when comparing SWGA to non-SWGA samples except in the repeated regions such as telomeres, which hindered accurate mapping. Moreover, we noticed a trend of superior mean coverage and percentage of the genome callable in samples presenting higher parasite densities than previously described for other malaria species [24, 25]. It is important to note that the future application of this methodology on dried blood spot clinical samples will improve its proficiency by lowering the required amount of starting DNA material and decreasing the cost of subsequent WGS.

Remarkably, the improvement of the genome coverage after SWGA has permitted the detection of a more significant number of supplementary high-quality SNPs for all samples. We observed that most of the samples had a very low proportion of commonly detected SNPs between pre and post-SWGA. Our results emphasize the power of SWGA as a tool in helping to identify additional SNPs from very low-represented target genomes within clinical samples that are overrepresented with the human genome. The significantly higher number of SNPs detected in polymorphic and antimalarial genes provides reliable and valuable information for population genomics and drug resistance marker analysis. Notably, we found several amino acid changes in dhfr (A15S, F57L, S58R, and S113N) previously associated with pyrimethamine or cycloguanil resistance in P. falciparum and P. vivax in several regions [38,39,40,41]. Further studies are required to determine these substitutions’ effects and their epidemiology.

Importantly, it is indispensable to increase the number of SWGA samples to improve the resolution of the geographical clustering following the phylogenetic analysis of the SNPs. This potential future data can be used in forthcoming population diversity studies to help determine any isolate’s countries, regions, and sub-regions. Moreover, this would help investigate the neglected parasite’s phylogenetic evolution and further understand the distinction between the species, particularly between the ovale species.

Furthermore, the SWGA protocol generates very long PCR products we used for PacBio long-read sequencing. The obtained data was of good quality and permitted the construction of de novo genome assemblies for two chosen samples: PM7 and POC2. The obtained assemblies have good synteny across significant scaffolds and are well conserved compared to their counterpart reference genomes. Besides, the long-read sequencing additionally gave us some insights into the accessible telomeric regions that carry specific genes of interest, like fam-l and fam-m, which are unique and potentially suspected to be involved in host-parasite interaction [34]. The new assemblies are a valuable addition, complementing Higgins et al.‘s recently published reference genomes [31]. They provide a valuable tool for studying the ovale species and analyzing their population structure and evolution.

Conclusions

Our results show that we accomplished a higher genome coverage for all the samples for the three neglected malaria species after applying the SWGA methodology. The strategy was compelling for P. malariae, for which we obtained an extraordinary enhancement greater than previously reported [29]. Moreover, this SWGA strategy was successfully established for P. ovale curtisi and P. ovale wallikeri. It has efficiently facilitated superior genome coverage for both species using the same primer set but to a lesser extent than previously published [30]. The achievement of better genome coverage has incredibly improved the downstream analysis, such as SNP calling, which is of significant importance in studying genetic diversity and drug-resistance marker genes. Moreover, the SWGA methodology permitted the preparation of de novo genome assemblies by exploiting the long PCR fragments for PacBio long-read sequencing, thus making the sub-telomeric regions of interest accessible.

Unfortunately, in this study, the limited number of samples used to develop and optimize the methodology (six and eight samples for P. malariae and P. ovale spp., respectively) did not allow us to perform a deeper population genomics analysis. Future SWGA-aided studies, including samples from a broader geographical localization, will help improve the population genetics-focussed studies of these neglected malaria parasites.

Methods

Sample collection and preparation

For this project, malaria imported cases were initially identified by examination of Giemsa-stained thick and thin smears under oil immersion at 500–1000x magnification microscopy in the malaria diagnosis reference laboratory in Jiangsu Institute of Parasitic Diseases (JIPD) as previously described [10]. gDNAs were extracted using QIAGEN QIAamp DNA Kit (Qiagen) from unprocessed whole blood from infected individuals working in sub-Saharan African countries. All DNA samples were quantified using a Qubit fluorometer (Thermo Fisher Scientific). Mono-infections with P. malariae, P. ovale curtisi, and P. ovale wallikeri species were confirmed using PCR targeting the mitochondrial cox3 gene following the previously described protocol [43]. Sample details are reported in Additional File 2.

Primer design for SWGA

P. malariae reference genome sequence (PmUG01: version 1) and P. ovale curtisi (PocGH01: version 2) reference genome sequences were downloaded from PlasmoDB and used for identifying primers to amplify the respective genome sequences preferentially. As previously published (www.github.com/eclarke/swga) [23], the SWGA software was used to generate species-specific primer sets providing as input the PmUG01 or PocGH0 reference genome and the established human reference human_g1k_v37 (http://ftp://ftp.1000genomes.ebi.ac.uk). We could not design a SWGA primer set specific to P. ovale wallikeri. Being very similar, we used the same primer set for both P. ovale curtisi and P. ovale wallikeri species. Various default parameters in the SWGA toolkit were changed to obtain P. malariae and P. ovale-specific SWGA primers (Additional File 3). Primers that were generated by the SWGA toolkit were then ranked based on the following primer selection parameters: min_fg_bind, max_bg_bind, min_tm, max_tm, max_size, min_bg_bind_dist, and max_fg_bind_dist. Five primer sets with different parameters were chosen and tested. The top 5 primer sets generated each for P. malariae and P. ovale spp were tested on two P. malariae samples (PM7 and PM8; Table S1) and two P. ovale curtisi samples (POC2 and POC8; Table S2) to choose the best one out of five sets. For P. malariae, we tested 5 top-ranked primers from fg_dist_gini, score, fg_dist_mean, and scoring_fn (Table S1), and for P. ovale, we tested 5 sets with the lowest fg_mean_dist (Table S2). This is because during the designing of SWGA primers for P. ovale curtisi, a SWGA paper by Clarke et al. [23] was out where they recommended testing 5–10 primers with the lowest fg_dist_mean score for the best results. For P. malariae, the set-1 with the lowest Gini index contains the following ten primers: AATATTT*C*G, AATTATT*C*G, ATATACG*A*T, ATATTAC*G*A, ATTATTT*C*G, ATTTATT*C*G, TATTATT*C*G, TTATTAA*C*G, TTATTTA*C*G, TTTATTA*C*G was chosen (Table S1). For both P. ovale species, the best set of primers, set-4 (with one of the top fg_dist_mean), consisted of a mix of ten primers: CGAAAAAA*T*A, CGTAAAAA*A*A, TAACGAA*T*T, TATCGAA*A*A, CGTAC*G*A, TTACGAA*A*T, TATCG*C*G, ATTACGA*A*A, TATCG*C*G, TTACGAA*T*A was chosen (Table S2). * represent phosphorothioate bonds to prevent primer degradation by the exonuclease activity of the Phi29 polymerase. The pairwise comparison of the published SWGA primers [29, 30] and our in-house primers for P. malariae and P. ovale spp, was made using the Needleman-Wunsch global alignment algorithm (Emboss v6.6.0.0) [44] with parameters -gapopen 10 -gapextend 0.5 and the Plots were generated in R using ggplot2 v3.4.3 [45] and ggpubr v0.6.0 (https://rpkgs.datanovia.com/ggpubr/). The Cividis color palette from viridis 0.6.4 package (https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.7890878) was chosen for visualization.

Selective whole-genome amplification

SWGA was performed essentially as previously described [24]. Briefly, 50 µL volume of SWGA reactions were performed with 50 ng of total input genomic DNA (with a minimum of 10 ng), 5 µL of 10X Phi29 DNA Polymerase Reaction Buffer (New England BioLabs), 30 units Phi29 DNA Polymerase (New England BioLabs), 0.5 µL of Purified 100x BSA (New England BioLabs), 5 µL 10 mM dNTP, 3 µL of 25 µM Primer mix and Nuclease-Free Water (Ambion). The reactions were performed on a thermocycler with ramp-down cycling conditions from 35 °C to 30 °C (10 min at each degree) followed by 16 h at 30 °C and 10 min at 65 °C to degrade Phi29 enzyme. The final products were purified using Ampure XP beads (Beckman Coulter). The DNA integrity and size were checked before and after each SWGA reaction using the Fragment Analyzer (Agilent Tech).

Whole-genome sequencing

Final SWGA products and unamplified gDNA were sheared on a Covaris E220 Focused ultrasonicator (Covaris) using the shearing settings: 40 s at 5% duty factor, 175 PIP, and 200 cycles per burst. NEBNext® UltraTM II DNA Library Prep Kit for Illumina® (New England Biolabs) generated short-insert libraries from 50 to 100 ng for SWGA and 50 ng DNA starting material for non-SWGA. Sequencing was performed on Illumina HiSeq4000 and Novaseq6000 platforms, yielding 150 bp paired reads. The raw reads are publicly available on the European Nucleotide Archive under the Project number PRJNA1005619.

Bioinformatic and variant analysis

The raw reads were subjected to quality control using FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Low-quality reads and adaptor sequences were quality trimmed and removed using Trimmomatic [46]. Reads smaller than 36 nucleotides long were discarded. Trimmed reads from P. malariae, P. ovale curtisi, and P. ovale wallikeri samples were then mapped against P. malariae PmUG01, P. ovale curtisi PocGH01 and P. ovale wallikeri PowCR01 reference genomes [34] respectively using Burrows Wheeler Alignment with maximal exact matches (BWA-MEM), v. 0.7.17 [47]. File format conversion, indexing, and sorting of bam files and mapped read statistics were obtained using Samtools [48]. To compare the genome coverage between pre-and post-SWGA samples (Fig. 2a and c Fig. S3a and Fig. S4a), we randomly sub-subsampled sequencing reads to match the number of sequencing reads between pre-and post-SWGA samples. Sub-sampling was carried out using option seqkit sample -s -n option.

Read mapping was followed by marking duplicates, and SNP calling was done according to the best practice guidelines [35, 36, 49] by the Genome Analysis Toolkit (GATK) v.3.8 [36]. VCF file manipulation and further analysis were conducted using VCFtools v.0.1.17 [50]. The final vcf file generated by GATK after filtering the SNPs using the filter options -filter “QD < 2.0 || FS > 60.0 || SOR > 4.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0” -filterName “SNP_filter” were then subjected to hard filtering to retain good quality SNPs. The hard filtering was performed using VCFtools. Initially, SNPs were annotated using the vcf-annotate --filter options Q = 30/d = 10 for non-SWGA samples and Q = 30/d = 30 for SWGA samples. Low-quality SNPs were subsequently removed with the vcftools command –remove-filtered-all on the annotated SNP files. Genome coverage plots were generated using an integrative genome viewer (IGV) [51].

Core genome

Hypervariable regions were identified by calculating SNP density on filtered high-quality SNPs using a 50 Kbp rolling window. The SNP density threshold, used to identify high-density regions, is set to Q3 + 1.5* interquartile range (IQR) as per the definition of outlier data points. Hypervariable regions were then identified by scanning the chromosome’s 5’ and 3’ regions. The start of the hypervariable region is defined if one of the following conditions is satisfied: (i) Presence of a hypervariable gene (pir, stp1, fam-l, and fam-m for P. malariae and pir and stp1 for P. ovale curtisi and P. ovale wallikeri), (ii) SNP density value exceeds the SNP density threshold. Hypervariable regions that were identified using the above algorithm are shaded. The SNP density was plotted as horizontal dashed lines; the hypervariable genes with vertical blue lines, while the hypervariable regions identified using the above method were shaded.

Phylogenetic tree construction

For phylogenetic tree construction, P. malariae SNPs from the core genome from samples in this study were reported elsewhere [29, 34], and for P. ovale curtisi and P. ovale wallikeri, SNPs from the core genome from the samples in this study and reported elsewhere were taken [33, 34]. A maximum likelihood-based phylogenetic trees were generated using the iqtree default parameters for ModelFinder + tree reconstruction + ultrafast bootstrap (1000 replicates). The trees were visualized using iTOL [52]. SNPs used to construct P. malariae, P. ovale curtisi, and P. ovale wallikeri phylogenetic trees are provided as Additional Files 4–6, respectively.

PCR for Sanger analysis

Few known highly polymorphic antigen and drug resistance marker genes were selected for PCR amplification followed by Sanger sequencing. The primer sequences for each gene are available in Additional File 7.

Genome assemblies using long-read sequencing

The generated post-SWGA long PCR products from P. malariae PM7 and P. ovale curtisi POC2 samples were purified using Ampure XP beads (Beckman Coulter) and used for long-read sequencing on the Pacific Biosciences (PacBio) Sequel II Platform. Reads with a quality score greater than 0.8 were used for subsequent analysis. Host sequences were filtered out from the SWGA PacBio sequencing data by aligning them to the human genome (version: hg19) using Minimap2 [53] with the default parameters. Then, the remaining non-host reads were assembled to the genome using Canu (v.2.2) with parameters “genomeSize = 33m minReadLength = 2000 minOverlapLength = 500” [54], and the haplotypic duplication was identified and removed from primary genome assemblies by using purge_dups with default parameters [55]. Reference-based scaffolding was performed by Ragtag (v2.1.0) [56]. The whole genome alignment between the published reference genome was performed using LASTZ (v. 1.10) (https://github.com/lastz/lastz). Circos plots were generated by Circos (v.069 − 9) [57].

Data availability

The raw reads from this study are publicly available on the Sequence Read Archive (SRA) under the Project number PRJNA1005619.

References

  1. Hawadak J, Dongang Nana RR, Singh V. Global trend of Plasmodium malariae and Plasmodium ovale spp. malaria infections in the last two decades (2000–2020): a systematic review and meta-analysis. Parasit Vectors. 2021;14(1):297.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Grande R, Antinori S, Meroni L, Menegon M, Severini C. A case of Plasmodium malariae recurrence: recrudescence or reinfection? Malar J. 2019;18(1):169.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Mueller I, Zimmerman PA, Reeder JC. Plasmodium malariae and Plasmodium ovale–the bashful malaria parasites. Trends Parasitol. 2007;23(6):278–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Collins WE, Jeffery GM. Plasmodium malariae: parasite and disease. Clin Microbiol Rev. 2007;20(4):579–92.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Tobian AA, Mehlotra RK, Malhotra I, Wamachi A, Mungai P, Koech D, et al. Frequent umbilical cord-blood and maternal-blood infections with Plasmodium Falciparum, P. Malariae, and P. Ovale in Kenya. J Infect Dis. 2000;182(2):558–63.

    Article  CAS  PubMed  Google Scholar 

  6. Collins WE, Jeffery GM. Plasmodium Ovale: parasite and disease. Clin Microbiol Rev. 2005;18(3):570–81.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Hawadak J, Dongang Nana RR, Singh V. Epidemiological, physiological and diagnostic comparison of Plasmodium Ovale Curtisi and Plasmodium Ovale Wallikeri. Diagnostics (Basel). 2021;11(10).

  8. Sutherland CJ, Tanomsing N, Nolder D, Oguike M, Jennison C, Pukrittayakamee S, et al. Two nonrecombining sympatric forms of the human malaria parasite Plasmodium Ovale occur globally. J Infect Dis. 2010;201(10):1544–50.

    Article  CAS  PubMed  Google Scholar 

  9. Snounou G, Sharp PM, Culleton R. The two parasite species formerly known as Plasmodium Ovale. Trends Parasitol. 2023.

  10. Fuehrer HP, Noedl H. Recent advances in detection of Plasmodium Ovale: implications of separation into the two species Plasmodium Ovale Wallikeri and Plasmodium Ovale Curtisi. J Clin Microbiol. 2014;52(2):387–91.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Yman V, Wandell G, Mutemi DD, Miglar A, Asghar M, Hammar U, et al. Persistent transmission of Plasmodium malariae and Plasmodium ovale species in an area of declining Plasmodium Falciparum transmission in eastern Tanzania. PLoS Negl Trop Dis. 2019;13(5):e0007414.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cao Y, Wang W, Liu Y, Cotter C, Zhou H, Zhu G, et al. The increasing importance of Plasmodium Ovale and Plasmodium malariae in a malaria elimination setting: an observational study of imported cases in Jiangsu Province, China, 2011–2014. Malar J. 2016;15:459.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Doderer-Lang C, Atchade PS, Meckert L, Haar E, Perrotey S, Filisetti D, et al. The ears of the African elephant: unexpected high seroprevalence of Plasmodium Ovale and Plasmodium malariae in healthy populations in Western Africa. Malar J. 2014;13:240.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012;487(7407):375–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, et al. Comparative genomics of the neglected human malaria parasite Plasmodium Vivax. Nature. 2008;455(7214):757–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Carlton JM, Volkman SK, Uplekar S, Hupalo DN, Pereira Alves JM, Cui L, et al. Population Genetics, Evolutionary Genomics, and genome-wide studies of Malaria: a View across the International Centers of Excellence for Malaria Research. Am J Trop Med Hyg. 2015;93(3 Suppl):87–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Pinheiro MM, Ahmed MA, Millar SB, Sanderson T, Otto TD, Lu WC, et al. Plasmodium knowlesi genome sequences from clinical isolates reveal extensive genomic dimorphism. PLoS ONE. 2015;10(4):e0121303.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Miotto O, Almagro-Garcia J, Manske M, Macinnis B, Campino S, Rockett KA, et al. Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet. 2013;45(6):648–55.

    Article  CAS  PubMed  Google Scholar 

  19. Flannery EL, Wang T, Akbari A, Corey VC, Gunawan F, Bright AT, et al. Next-generation sequencing of Plasmodium Vivax patient samples shows evidence of direct evolution in drug-resistance genes. ACS Infect Dis. 2015;1(8):367–79.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Assefa S, Lim C, Preston MD, Duffy CW, Nair MB, Adroub SA, et al. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium Knowlesi. Proc Natl Acad Sci U S A. 2015;112(42):13027–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Diez Benavente E, Ward Z, Chan W, Mohareb FR, Sutherland CJ, Roper C, et al. Genomic variation in Plasmodium Vivax malaria reveals regions under selective pressure. PLoS ONE. 2017;12(5):e0177134.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Leichty AR, Brisson D. Selective whole genome amplification for resequencing target microbial species from complex natural samples. Genetics. 2014;198(2):473–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Clarke EL, Sundararaman SA, Seifert SN, Bushman FD, Hahn BH, Brisson D. Swga: a primer design toolkit for selective whole genome amplification. Bioinformatics. 2017;33(14):2071–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, et al. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 2016;7:11078.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Aninagyei E, Smith-Graham S, Boye A, Egyir-Yawson A, Acheampong DO. Evaluating 18s-rRNA LAMP and selective whole genome amplification (sWGA) assay in detecting asymptomatic Plasmodium falciparum infections in blood donors. Malar J. 2019;18(1):214.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Cowell AN, Loy DE, Sundararaman SA, Valdivia H, Fisch K, Lescano AG et al. Selective whole-genome amplification is a robust method that enables scalable whole-genome sequencing of Plasmodium Vivax from Unprocessed Clinical samples. MBio. 2017;8(1).

  27. Benavente ED, Gomes AR, De Silva JR, Grigg M, Walker H, Barber BE, et al. Whole genome sequencing of amplified Plasmodium knowlesi DNA from unprocessed blood reveals genetic exchange events between Malaysian Peninsular and Borneo subpopulations. Sci Rep. 2019;9(1):9873.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Oyola SO, Ariani CV, Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, et al. Whole genome sequencing of Plasmodium Falciparum from dried blood spots using selective whole genome amplification. Malar J. 2016;15(1):597.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Ibrahim A, Diez Benavente E, Nolder D, Proux S, Higgins M, Muwanguzi J, et al. Selective whole genome amplification of Plasmodium malariae DNA from clinical samples reveals insights into population structure. Sci Rep. 2020;10(1):10832.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Joste V, Guillochon E, Clain J, Coppee R, Houze S. Development and optimization of a selective whole-genome amplification to study Plasmodium Ovale Spp. Microbiol Spectr. 2022;10(5):e0072622.

    Article  CAS  PubMed  Google Scholar 

  31. Higgins M, Manko E, Ward D, Phelan JE, Nolder D, Sutherland CJ, et al. New reference genomes to distinguish the sympatric malaria parasites, Plasmodium Ovale Curtisi and Plasmodium Ovale Wallikeri. Sci Rep. 2024;14(1):3843.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Gomes AR, Ravenhall M, Benavente ED, Talman A, Sutherland C, Roper C, et al. Genetic diversity of next generation antimalarial targets: a baseline for drug resistance surveillance programmes. Int J Parasitol Drugs Drug Resist. 2017;7(2):174–80.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Ansari HR, Templeton TJ, Subudhi AK, Ramaprasad A, Tang J, Lu F, et al. Genome-scale comparison of expanded gene families in Plasmodium Ovale Wallikeri and Plasmodium ovale curtisi with Plasmodium malariae and with other Plasmodium species. Int J Parasitol. 2016;46(11):685–96.

    Article  CAS  PubMed  Google Scholar 

  34. Rutledge GG, Böhme U, Sanders M, Reid AJ, Cotton JA, Maiga-Ascofare O, et al. Plasmodium malariae and P. ovale genomes provide insights into malaria parasite evolution. Nature. 2017;542(7639):101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinf. 2013;43:1101–33.

    Google Scholar 

  36. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Ibrahim A, Mohring F, Manko E, van Schalkwyk DA, Phelan JE, Nolder D, et al. Genome sequencing of Plasmodium malariae identifies continental segregation and mutations associated with reduced pyrimethamine susceptibility. Nat Commun. 2024;15(1):10779.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Alam MT, Bora H, Bharti PK, Saifi MA, Das MK, Dev V, et al. Similar trends of pyrimethamine resistance-associated mutations in Plasmodium Vivax and P. Falciparum. Antimicrob Agents Chemother. 2007;51(3):857–63.

    Article  CAS  PubMed  Google Scholar 

  39. McCollum AM, Poe AC, Hamel M, Huber C, Zhou Z, Shi YP, et al. Antifolate resistance in Plasmodium Falciparum: multiple origins and identification of novel dhfr alleles. J Infect Dis. 2006;194(2):189–97.

    Article  CAS  PubMed  Google Scholar 

  40. Auliff A, Wilson DW, Russell B, Gao Q, Chen N, le Anh N, et al. Amino acid mutations in Plasmodium Vivax DHFR and DHPS from several geographical regions and susceptibility to antifolate drugs. Am J Trop Med Hyg. 2006;75(4):617–21.

    Article  CAS  PubMed  Google Scholar 

  41. Tanomsing N, Imwong M, Pukrittayakamee S, Chotivanich K, Looareesuwan S, Mayxay M, et al. Genetic analysis of the dihydrofolate reductase-thymidylate synthase gene from geographically diverse isolates of Plasmodium malariae. Antimicrob Agents Chemother. 2007;51(10):3523–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Chen J, Ma X, Tang J, Xu S, Gu Y, Tang F, et al. Disparate selection of mutations in the dihydrofolate reductase gene (dhfr) of Plasmodium Ovale Curtisi and P. o. Wallikeri in Africa. PLoS Negl Trop Dis. 2022;16(12):e0010977.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Isozumi R, Fukui M, Kaneko A, Chan CW, Kawamoto F, Kimura M. Improved detection of malaria cases in island settings of Vanuatu and Kenya by PCR that targets the Plasmodium mitochondrial cytochrome c oxidase III (cox3) gene. Parasitol Int. 2015;64(3):304–8.

    Article  CAS  PubMed  Google Scholar 

  44. Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software suite. Trends Genet. 2000;16(6):276–7.

    Article  CAS  PubMed  Google Scholar 

  45. Ginestet C. ggplot2: elegant graphics for data analysis. J Roy Stat Soc A. 2011;174:245.

    Article  Google Scholar 

  46. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinf. 2013;43:11. 0 1 – 0 33.

    Google Scholar 

  50. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Letunic I, Bork P. Interactive tree of life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47(W1):W256–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36(9):2896–8.

    Article  CAS  PubMed  Google Scholar 

  56. Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23(1):258.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank the staff of the Bioscience Core Laboratory in KAUST for sequencing DNA libraries and all members of the Pathogen Genomics Lab at KAUST for their support. We thank the staff of the local China Centers for Disease Control and Prevention of Jiangsu Province for assistance in coordinating patients’ information and all sample collection, preparation, and species diagnostics.

Funding

This study was supported by baseline funding (BAS/1/1020-01-01) from King Abdallah University of Science and Technology (KAUST) to AP, the National Nature Science Foundation of China (82320108014 and 81971967) to JC and the National Nature Science Foundation of China (82372275) YL. We thank members of the Bioscience Core Laboratory in KAUST for sequencing the samples on the Illumina NovaSeq6000 and PacBioRSII platforms.

Author information

Authors and Affiliations

Authors

Contributions

AP conceptualized the study, secured funding, and supervised the study. JC secured funding and supervised the study. FBR wrote the manuscript draft, performed wet-lab experimentation and analysis and supervised the wet-lab work. AKS performed bioinformatic data analytics, drafted figures and tables, and supervised the data analysis part of the study. CL, RS and RN performed bioinformatic data analytics. SX, GZ, and YL contributed to sample collection, wet lab experiments, and data analysis. MA, SM, DL, ZS and CA performed wet lab experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Arnab Pain.

Ethics declarations

Ethics approval and consent to participate

The research adhered to the guidelines outlined in the Declaration of Helsinki and informed consent forms were obtained from all patients. All experiments were performed in accordance with the relevant guidelines and regulations and were approved by the Institutional Review Board at KAUST (19IBEC12-Pain) and JIPD (IRB00004221). All data were anonymised before receiving any metadata for analysis. Non-personal identifiers were used during analysis and presentation.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben-Rached, F., Subudhi, A.K., Li, C. et al. Leveraging genomic insights from the neglected malaria parasites P. malariae and P. ovale using selective whole genome amplification (SWGA) approach. BMC Genomics 26, 118 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11292-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11292-8

Keywords