- Research
- Open access
- Published:
Genomic analyses revealed low genetic variation in the intron-exon boundary of the doublesex gene within the natural populations of An. gambiae s.l. in Burkina Faso
BMC Genomics volume 25, Article number: 1207 (2024)
Abstract
Background
The recent success of a population control gene drive targeting the doublesex gene in Anopheles gambiae paved the way for developing self-sustaining and self-limiting genetic control strategies targeting the sex determination pathway to reduce and/or distort the reproductive capacity of insect vectors. However, targeting these genes for genetic control requires a better understanding of their genetic variation in natural populations to ensure effective gene drive spread. Using whole genome sequencing (WGS) data from the Ag1000G project (Ag3.0, 3.4 and 3.8), and Illumina pooled amplicon sequencing, we investigated the genetic polymorphism of the intron-4–exon-5 boundary of the doublesex gene in the natural populations of An. gambiae sensu lato (s.l.).
Results
The analyses showed a very low variant density at the gRNA target sequence of the Ag(QFS)1 gene drive (previously called dsxFCRISPRh) within the populations of West and East Africa. However, populations from the forest area in Central Africa exhibited four SNP at frequencies ranging from 0.011 to 0.26. The SNP (2R:48714641[C > T]) at high frequencies, i.e. 0.26 is identified within the An. coluzzii population from Angola. The analyses also identified 90 low frequency (1 − 5%) SNPs in the genomic region around the gRNA target sequence (intron-4–exon-5 boundary). Three of these SNPs (2R:48714472 A > T; 2R:48714486 C > A; 2R:48714516 C > T) were observed at frequencies higher than 5% in the UTR region of the doublesex gene. The results also showed a very low variant density and constant nucleotide diversity over a five-year survey in natural An. gambiae s.l. populations of Burkina Faso.
Conclusion
These findings will guide the implementation of doublesex-targeted gene drives to support the current control tools in malaria elimination efforts. Our methods can be applied to efficiently monitor the evolution of any sequence of interest in a natural population via pooled amplicon sequencing, surpassing the need for WGS.
Background
Malaria is the deadliest vector-borne disease in the world, caused by Plasmodium parasites transmitted to humans by female Anopheles mosquitoes. According to the WHO, deaths attributed to malaria have increased from ~ 400,000 (2018) to ~ 600,000 (2022) in the past five years [1]. This has been primarily attributed to the spread of insecticide resistance across sub-Saharan Africa, where the disease is endemic. In response to these challenges, innovative strategies based on genetic control are being developed to strengthen current tools and accelerate malaria elimination [2]. Homing-based gene drives are engineered selfish genetic elements propagating their inheritance using a homing endonuclease [3]. Gene drives can be used for vector control by being engineered to spread desirable genetic traits in the population, such as parasite refractoriness (population replacement), or by spreading negative fitness traits, such as infertility, amongst vector populations (population suppression). The CRISPR/Cas9 gene editing system has been adapted to act as an RNA-guided homing endonuclease for use in gene drive strategies by inserting the Cas9 and gRNA cassettes within their recognition sequence [4] (Supplementary Material 3). CRISPR-based gene drives have shown great promise as tools to combat malaria, both as population replacement [5] and population suppression genetic control strategies [6]. Gene drive strategies differ from traditional genetic modification methods because of their ability to rapidly self-propagate, their inheritance bias, and their potential for population-level change.
Most population replacement strategies are focused on introducing genetic modifications or novel transgenes to make the new population pathogen-refractory [7, 8]. Previous studies have shown that FREP1 mediates Plasmodium invasion of the Anopheles midgut epithelium [9]. CRISPR knock-outs of the FREP1 gene showed resistance to human and rodent malaria parasites [10]. Similarly, the expression of small antimicrobial peptides in the mosquito midgut delayed the sporogonic development of the malaria parasite. It could be propagated in a single-generation experiment through gene drive homing [5].
Conversely, population suppression strategies aim to alter key population growth parameters, such as sex ratio and fitness. The sex chromosomes of An. gambiae were targeted by a synthetic sex-distorter I-PpoI, which is not a gene drive, and designed to induce a strong negative bias toward X chromosome–carrying spermatozoa, resulting in 95% male offspring [11]. Similar rates were also achieved using a synthetic CRISPR/Cas9-based sex distorter [12]. Recent studies have shown additional benefits in targeting the sex determination pathway to disrupt the sex ratio in the offspring for vector control purposes [13, 14].
In most insect species, the sex determination pathway starts with a primary central gene that stimulates the molecular cascade, leading to the alternative splicing of the Doublesex (dsx) and Fruitless (fru) genes [15], making these two genes the endpoint of the sex determination mechanisms [16]. The Doublesex gene (dsx) plays a critical role in the sexual differentiation of a wide range of insects, particularly the fruit fly Drosophila melanogaster, where its role in sex determination has been extensively studied. The architecture and function of the dsx gene have been well characterized in various insects [17, 18], including Anopheles mosquitoes [19]. In An. gambiae, the dsx gene spans an 88.598 kb (2R:48703664–48792262) sequence on chromosome 2R and consists of seven exons, with exon five being female-specific and exon six male-specific. Alternative splicing of the dsx gene during embryonic development produces two isoforms based on the sex-specific exons, which control the expression of endpoint genes required to exhibit physical sexual dimorphism [20].
Using CRISPR/Cas9, the disruption of the female-specific intron-4-exon-5 boundary of the dsx gene in An. gambiae resulted in morphological abnormalities in homozygous knock-out females, including in the development of the proboscis, which caused knockout females to be unable to draw a blood meal, mate, and produce offspring [6]. A CRISPR-based gene drive built against the female isoform of dsx (Supplementary Material 3B), termed Ag(QFS)1 (previously called dsxFCRISPRh), was able to rapidly spread in caged laboratory populations, causing a swift reduction of vector population density, and ultimately eliminating the populations within a year [6, 14, 21]. This success made the dsx gene a relevant target for developing genetic control strategies to reduce the population density of malaria mosquitoes and other harmful insects, including agricultural pests [22].
Although gene drive targeting the dsx gene is a promising tool for vector control, its success in wild populations requires that the target site be very little polymorphic. The presence of genetic polymorphisms in a gene drive target sequence could inhibit gRNA recognition and subsequent Cas9 cleavage, limiting the spread of the homing and consequently reducing the efficacy of the genetic tool [23,24,25]. Therefore, the success of most CRISPR/Cas9-based genetic control tools requires the target sequence to show limited genetic variation in natural populations. This study investigates the genetic variation within the intron-4-exon-5 boundary (2R: 48714420–48714720) of the Doublesex gene within wild An. gambiae s.l. by analyzing existing population genomics data generated by the Anopheles gambiae 1000 genomes project (Ag1000G), as well as newly generated data through targeted pooled amplicon sequencing of wild-caught mosquito populations from Burkina Faso. We also investigate the evolution and spatial distribution of the genetic polymorphisms discovered. We reveal the limited variation of the female-specific intron-exon boundary of the doublesex gene in natural populations. These results are valuable in guiding the implementation of gene drive tools to complement malaria elimination efforts in Africa.
Results
Spatial distribution of genetic variants in the Ag(QFS)1 gene drive target sequence
We first investigated the distribution of variants across a sequence spanning the intron-4–exon-5 boundary of the doublesex gene (2R: 48714420–48714720), including the Ag(QFS)1 gene drive target sequence (2R: 48714637–48714660) using an existing genomics dataset generated by the Anopheles gambiae 1000 genomes Consortium (Ag1000G). This dataset includes SNP calls from 4200 wild-caught An. gambiae s.l. (An. coluzzii, An. gambiae s.s. and An. arabiensis) mosquitoes from 19 African countries (Fig. 1).
We identified 143 single nucleotide polymorphisms (SNPs) at varying frequencies from 0.000609 to 0.59 in the intron-4–exon-5 boundary of the doublesex gene within vector populations collected in 17 African countries (Supplementary Material 1). The SNP 2R:48,714,486[C > A], located in the UTR sequence of the exon 5, is found at relatively high frequencies (freq. ~ 0.12–0.59) in all An. coluzzii populations. The third allele of 2R:48,714,486 ([C > G]) is only identified in An. arabiensis populations (Freq. ~ 0.006–0.15).
The target sequence of the Ag(QFS)1 gene drive is located in the boundary of the intron 4 and the exon 5, spanning 23 bp from the nucleotide in the 2R chromosome. This sequence contained five SNPs in the vector populations at relatively low frequencies (freq. ~ 0.00–0.26). Most SNPs were identified at very low frequencies (Freq. < 0.01) in West Africa and the An. gambiae populations of Burkina Faso, while no SNP was identified in the populations of East Africa (Fig. 2). The most abundant target site SNP, 2R: 48,714,641[C > T], was found at frequencies higher than 0.05 (0.011–0.26) in the An. gambiae populations of central Africa. Specifically, the 2R: 48,714,641[C > T] SNP was found at frequencies of 0.26 in Angola, 0.07 in the DRC, 0.06 in Cameroon, and 0.01 in Gabon. This was previously reported as a G > A SNP, as read on the antisense DNA strand [6, 26]. This position was found to be triallelic (1 reference and 2 alternative alleles), and its third allele, 2R: 48,714,641[C > A], was found in An. arabiensis populations from Burkina Faso at a frequency lower than 0.01 (Fig. 2). The presence of this SNP (2R: 48714641[C > T]) at frequencies higher than 5% in the gRNA target sequence within the Central African populations raised concern about the spread of the Ag(QFS)1 gene drive in these populations. However, it was found to be cleavable by the Cas9/gRNA ribonucleoprotein in vitro [6], and by the Ag(QFS)1 gene drive in vivo [26], so the gene drive should be able to spread in its presence. No SNPs were identified in the other An. gambiae populations of West and East Africa.
Heat map showing the allele frequencies of the variants in the Ag(QFS)1 gene drive target sequence (2R: 48714637–48714660) within the An. gambiae s.l. populations collected in 19 African countries; the y-axis of the heat map shows the non-synonymous variant positions in the 2R chromosome, while the x-axis shows the populations of An. gambiae s.l. and number of individual mosquitoes processed (n); the gradient color bar shows the distribution of the allele frequencies; MLI: Mali; BFA: Burkina Faso; AGO: Angola; CAF: Central African Republic; CIV: Côte d’Ivoire; CMR: Cameroon; COD: Democratic Republic of Congo; MYA: Mayotte Island; GAB: Gabon; GHA: Ghana; GIN: Guinea; GMB: Gambia; GNB: Guinea-Bissau; KEN: Kenya, MOZ: Mozambique; MWI: Malawi; TZA: Tanzania; UGA: Uganda
Time series variation of the gene drive target sequence in Burkina Faso
Using the Ag1000G dataset, we also investigated the year-to-year genetic variation in the intron-4–exon-5 boundary of the doublesex gene within wild An. gambiae s.l. populations collected from 2012 to 2019 in three villages (Bana, Souroukoudinga and Pala) of Burkina Faso. The analyses identified 90 SNPs at frequencies ranging from 0.001 to 0.54 in the intron-4–exon-5 boundary of the doublesex gene over the seven-year survey (mainly in the untranslated region, UTR of doublesex) (Supplementary Material 2). The 2R:48,714,486[C > A] SNP, present in the dsx exon 5 UTR, was the most abundant SNP identified at frequencies of 0.39–0.54. All other SNPs were found at frequencies lower than 0.01. Most of the SNPs (except 2R:48714445[C > A,T], 2R:48714453[G > A], 2R:48714692[T > A] in An. gambiae s.s. and 2R:48714486[C > A] in An. coluzzii) were non-constant over the years, i.e. they appeared and disappeared the following year, presumably removed by purifying selection or drift.
The Ag(QFS)1 gene drive target sequence (2R:48714637–2R:48714660) displayed four non-constant SNPs at very low frequencies (~ 0.0057–0.05) from 2012 to 2019 in the vector populations. These SNPs were non-constant over the seven-year survey and each of them was removed in the following year after its apparition in the populations (Fig. 3).
The nucleotide diversity (θπ) ranged from 0 to 0.04 in the female-specific intron-exon boundary of the doublesex gene. Interestingly, the nucleotide diversity was constant over the years (from 2012 to 2017). The highest nucleotide diversity (θπ ~ 0.04) was recorded in the UTR region of the female-specific intron-exon boundary of the doublesex gene. In the target sequence of the Ag(QFS)1 gene drive, the nucleotide diversity was close to 0 in 2012. It remained so until 2017 (Fig. 4). The analyses showed a constant nucleotide diversity in the gRNA target sequence in the An. gambiae population of the sampling area. One of the challenges of genetic control remains the potential emergence of new genetic variants at the target sequence, over the time, through evolutionary processes. Our results showed no change in the genetic variation of the gRNA target sequence within the natural populations of the three villages (Bana, Pala and Souroukoudinga) over the seven-year survey, which is promising for the long term efficacy and spread of a gene drive strategy targeting this sequence.
Allele frequencies of the SNPs in the gene drive target sequence (2R: 48714637–48714660) within the An. gambiae s.l. populations collected in 3 villages (Bana, Pala, and Souroukoudinga) from 2012 to 2019 in Burkina Faso; A: Heat map showing the evolution of the allelic frequencies of the SNPs in the gRNA target sequence; the y-axis of the heat map shows the positions of non-synonymous variants in the 2R chromosome while the x-axis shows the populations of An. gambiae s.l. and several individual mosquitoes processed (n). The gradient color bar shows the distribution of the allele frequencies. BN: Bana; SK: Souroukoudinga; PL: Pala. B: mapping of the SNPs positions alongside the reference genome in the different populations; PAM: protospacer adjacent motif
Evolution of the nucleotide diversity in the An. gambiae s.s. and An. coluzzii populations collected in 3 villages (Bana, Pala and Souroukoudinga) from 2012 to 2017 in Burkina Faso; the figure on top represents samples collected in 2012 while the figure below represents samples collected in 2017; the rectangles on the bottom shows the intron-4-exon-5 boundary of the doublesex gene (dark red: exon-5 untranslated sequence; gray: exon-5 translated sequence and simple gray line: intron-4); the two vertical pink-filled spans are delimiting the Ag(QFS)1 or CRISPR/Cas9 target sequence; the y-axis of the figures shows the nucleotide diversity (θπ) while the x-axis shows the 2R chromosome positions; the nucleotide diversity was constant over the five years and the high values were recorded in the UTR region but remained steady over the five years
Pooled amplicon sequencing to detect genetic variants in the Ag(QFS)1 gene drive target sequence
The application of cost-effective and efficient methods is essential for monitoring the evolution of the Ag(QFS)1 gene drive target sequence in the natural population. Here, we employed pooled amplicon sequencing to investigate the genetic polymorphism of the Ag(QFS)1 gene drive target sequence in natural populations from Burkina Faso.
We genotyped more than 600 individual mosquitoes belonging to three species of the An. gambiae species complex (An. gambiae s.s., An. coluzzii and An. arabiensis) sampled in Bana, Pala, Souroukoudinga, Soni, Moara-Grand, Toma-Ile, Toson, Saran and from laboratory colony (Fig. 5). An. coluzzii was the most predominant species in almost all the villages except Pala, where An. gambiae s.s. and An. arabiensis were most prevalent. This allowed the formation of 12 pools (9 pools of An. coluzzii, 2 pools of An. gambiae s.s. and 1 pool of An. arabiensis) of 50 individuals for gDNA extraction and pooled amplicon sequencing of a 365 bp region encompassing the intron4-exon5 boundary of doublesex, as well as the whole exon 5 CDS, including the Ag(QFS)1 target site. The genomic data analyses identified 60 variants, compared to 67 variants detected in the Ag1000G data in the same 365 bp region. Of the 60 SNPs, 17 positions were found to be triallelic. Importantly, no indels were identified in the dataset. Most of the variants (n = 49 SNPs) were identified in the An. coluzzii populations followed by An. gambiae s.s. (n = 34 SNPs) and An. arabiensis (n = 16 SNPs), which showed this region’s lowest degree of genetic diversity. The SNP density was approximately 0.2 bp−1 [60/300], indicating the presence of 1 SNP per 5 bp. Most of these SNPs were identified in the UTR of exon 5 and intron 4, and they are expected to show higher variation as they constitute non-coding regions. The exon 5 CDS showed a varying number of SNPs between the populations, from 1 SNP (0.011 bp−1 [1/91]) in the laboratory colony to 7 SNPs (0.08 bp−1 [7/91]) in the An. gambiae s.s. samples of Pala. Only two of the SNPs (2R:48714617(C > T) and 2R:48714592(C > T)) identified in the exon 5 CDS were constant and present in all wild-caught populations.
Almost all SNPs were identified at low frequencies (Freq < 0.05) except the SNP 2R:48,714,486(C > A) identified in the UTR sequence of the exon 5 at relatively high frequencies (freq = 0.31–0.45) in the An. coluzzii populations, while in An. arabiensis and An. gambiae s.s. it was present at low frequencies (freq. ~ 0.01). A third allele of this position, the SNP 2R:48,714,486(C > G), was identified at low frequencies (freq. = 0.024) in An. arabiensis populations. In the laboratory samples, this SNP is found at low frequencies and seems to be replaced by the SNP 2R:48,714,472 (A > T), also identified in the UTR sequence of exon 5 at a frequency of 0.17. Fig. 6 shows the allele frequencies of the SNPs whose maximum frequencies were higher than 0.01 in at least one population. The Ag(QFS)1 gene drive target sequence exhibited one SNP at low frequencies, i.e. less than 0.05. The analyses showed most of the SNPs in the UTR sequence of the exon 5 of the dsx gene, especially those with relatively high frequencies (Fig. 7). Interestingly, the distribution of the SNPs along the target sequence of the dsx gene and their allele frequencies were similar to those found in the time series Ag1000G data in Burkina Faso.
Heat map showing the allele frequencies of the SNPs (whose max frequencies are higher than 1%) identified in the female-specific intron-exon boundary of the doublesex gene identified via pooled amplicon sequencing from laboratory (Lab) and wild-caught mosquito samples (BN: Bana village, SK: Souroukoudinga, PL: Pala, SN: Soni, MG: Moara-Grand, TI: Toma-île, SR: Saran, TS: Toson, Ac: An. coluzzii, Ag: An. gambiae s.s., Aa: An. arabiensis); SNPs were called using the lofreq software; the y-axis of the heat map shows the positions of the non-synonymous variant in the 2R chromosome while the x-axis shows the populations of An. gambiae s.l. and number of individual mosquitoes processed (n); the gradient color bar shows the distribution of the allele frequencies; the sky blue band shows the frequencies of the SNPs identified in the Ag(QFS)1 or CRISPR/Cas9 target sequence; UTR: the untranslated region (2R:48714420–48714556) of the female-specific exon5; CDS: the coding sequence (2R:48714557–48714648) of the female-specific exon5; Intron: a part (2R:48714649–48714720) of the female-specific intron4; Ag1000G: Anopheles gambiae 1000 genomes
Distribution of the SNPs and their frequencies alongside the intron-4–exon-5 boundary of the doublesex gene; the rectangle on bottom shows the intron-4-exon-5 boundary of the doublesex gene (dark red: exon-5 untranslated sequence, gray: exon-5 translated sequence and simple gray line: intron-4); the y-axis shows the allele frequencies of the SNPs and the x-axis shows the positions of the SNPs in the 2R chromosome; the two vertical dashed lines are delimiting the Ag(QFS)1 or CRISPR/Cas9 target sequence; The DNA sequences at the bottom shows the SNPs positions alongside the reference genome in the gRNA target sequence within different populations. GD: « Gene Drive »; Amplicon seq. data: Amplicon sequencing data from laboratory (Lab) and wild-caught mosquito samples from 8 villages in Burkina Faso; Ag1000G data: Whole genome sequencing data of Anopheles gambiae 1000 genomes Consortium from wild caught An. gambiae s.l. mosquitoes collected in 3 villages from Burkina Faso
Discussion
Malaria control remains a major challenge in sub-Saharan Africa due to the rapid spread of resistance. In response to this challenge, innovative control strategies are being developed to strengthen current tools and accelerate the elimination of malaria [2]. Ongoing genetic engineering and biocontrol studies through endosymbionts intend to develop effective and sustainable control tools to reduce malaria transmission [27, 28]. By targeting the sexual determination pathway [6], CRISPR-based gene drive technologies demonstrated their potential advantages as the basis for developing effective and sustainable tools to support malaria elimination efforts [7]. However, the success of a CRISPR-based gene drive for vector control requires its target sequence to show as little polymorphism as possible in natural populations to ensure efficient homing rates. Another possible prerequisite is for the target sequence to be highly conserved over the years to ensure the continued matching of the gRNA to the target sequence in natural populations several years after the release [29]. This study investigated the genetic polymorphism of the intron-4-exon-5 boundary of the doublesex gene within the natural populations of An. gambiae s.l. These insights may contribute for the future deployment of gene drive control strategies targeting the doublesex gene [6, 14].
The results showed a variable genetic polymorphism among the vector populations with a global variant density of one SNP every 5 bp. This density was low compared to what was previously found in the dsx gene and in the An. gambiae genome that was about one SNP every two nucleotides [30, 31]. Interestingly, most variants were identified in the gene’s untranslated region of exon 5. The analysis displayed a meager polymorphism in the Ag(QFS)1 target sequence (2R: 48714637–48714660). Analyses of time series data in Burkina showed four non-constant SNPs over a seven-year survey in the Ag(QFS)1 target sequence. Each SNPs appeared in different years and were removed the following year after their occurrence in the same population, presumably removed by purifying selection or drift. This pattern of genetic variation in the target sequence is expected to be sufficient for the rapid spread of the Ag(QFS)1 gene drive in natural populations. However, given that natural populations are dynamic, the genetic variation might be different in space over time. This situation is observed in the forest area of Central Africa where An. gambiae s.l. populations exhibited an SNP 2R:48,714,641[C > T] at frequencies reaching 0.26 in the gRNA target sequence in Angola, Democratic Republic of Congo, Cameroon, and Gabon but not in the Central African Republic. The third allele of this position, the 2R:48,714,641[C > A] was found in An. arabianesis populations of Burkina Faso. The SNP 2R:48,714,641[C > T] was identified in previous studies at Hardy-Weinberg equilibrium in An. coluzzii and An. gambiae populations of Angola [6, 26]. The emergence pattern of this SNP suggests its ubiquitous presence at evolving frequencies in the forest area of Central Africa and its presence could reduce the spread of the dsx-drive in vector populations. Recent studies showed a complex genetic structure of An. gambiae populations in Africa with important variants flow between populations from Western, Central and Eastern Africa [30, 31]. The kdr mutations (kdr-L995F and kdr-L995S) constitute an example of continent-widespread genetic variants in the An. gambiae population [32]. Seeing the high connectivity of An. gambiae populations [30, 31], the spread of the Cas9-based dsx-drive could be reduced if the SNP 2R:48,714,641[C > A] is positively selected and increased in frequency in the populations. Thus, the evolutionary processes driving this variant need to be investigated to understand the extent and the evolution of this SNP within the vector populations in Central Africa.
The emergence of resistance alleles remains a significant challenge in developing Cas9-based vector control tools [25, 33]. In addition to the existing variants, new alleles resulting from de novo mutations could be selected, threatening the efficacy of CRISPR-based gene drives in natural populations. Resistance variants may be introduced by gene drive activity upon error-prone NHEJ repair of a Cas9-induced double-stranded break in the DNA. This can be exacerbated if the gene drive allows Cas9 deposition into the early embryo, where NHEJ occurs preferentially over HDR [25, 34, 35]. Several strategies are being explored to reduce the potential emergence of resistance variants in natural populations. Using alternative promoters to contain Cas9 spatiotemporal expression in germline tissues reduced the rate of resistant allele creation [25].
Moreover, using multiplexed gRNAs targeting neighboring sequences simultaneously constitutes a promising strategy [26, 36, 37]. In this case, if a resistant allele gets created or is pre-existing in the population at one of the target sites, it will be removed as long as at least one other target site is still cleavable, and able to induce homing [26]. Resistant alleles would need to co-exist at all target sites to prevent gene drive homing, and they will need to cooperatively encode a functional gene product to prevent long-term gene drive spread. Nevertheless, this strategy remains complex in practice because of the difficulties in finding many gRNAs targeting neighboring sites and exhibiting suitable parameters in a single gene [38]. Another approach to prevent the emergence of resistance could be using engineered endonucleases or alternative nucleases such as dCas9-FokI and Cpf1/Cas12a. These nucleases showed an ability to tolerate target sequence variation, so existing polymorphisms at the cut site or outside this site are less likely to prevent cleavage and homing [39, 40]. Recently, a Cas12a split gene drive was developed in Drosophila melanogaster [41]. Combining a homing gene drive with an autosomal sex-distorter would also reduce the likelihood of a homing-resistant allele being selected [14]. Ultimately, all alternative strategies developed to mitigate resistance would only be successful if the gene drive targets a highly conserved sequence with minimal polymorphism tolerance. The present study set out a framework for in-depth investigation of the pre-existing natural variation in a natural population to inform gene drive implementation. Future directions would continue to monitor the evolution of genetic variation in the gene drive target site within natural vector populations and search for the potential presence of gene drive target sites in the genome of secondary vectors.
Conclusion
In this study, we investigated the genetic polymorphism of the intron-4-exon-5 boundary of the doublesex gene, a potential target for developing Cas9-based control tools. The results showed very low polymorphism at the target sequence of the Ag(QFS)1 gene drive in the anopheline populations of Burkina Faso. No SNP was found under positive selection in this region over a seven-year (from 2012 to 2019) survey in An. gambiae s.l. populations from Burkina. Interestingly, most vector populations in West and East Africa showed a very low variant density in the gRNA target sequence. Understanding the genetic variation of gene drive target sequences in natural populations is useful and valuable to guide the implementation of gene drive tools to support current control tools for malaria elimination.
Materials and methods
Mosquito sampling
Mosquito sampling was carried out in 8 villages (Bana, Pala, Souroukoudinga, Toma-Ile, Soni, Moara-Grand, Saran and Tosson) in the western part of Burkina Faso (Fig. 5) during 2019. The environmental conditions of these villages are almost similar with an annual rainfall ranging from 700 mm in the north to 1200 mm in the south between May and October. The average annual temperature sits at approximately 27 °C. The sampling area is dominated by agricultural practices especially cereals and cotton cultivation creating suitable environmental niches for the proliferation of mosquito populations. Mosquito samples were collected using pyrethrum spraying catches (PSC) in the rainy season of 2019. Indoor resting mosquitoes were collected early in the morning in 20 randomly selected houses in each village. After collection, mosquitoes were morphologically identified using morphological keys [42, 43] and stored in 80° alcohol for further analyses.
Molecular analyses
Mosquito species genotyping
The An. gambiae s.l. samples were individually genotyped using the SINE200 protocol described by Santolamazza and collaborators [44] and grouped per species/location. DNA was extracted from individual mosquitoes using a non-destructive extraction method and used to perform a polymerase chain reaction (PCR) for species identification. The polymerase chain reaction (PCR) was performed using the S200 × 6.1F: 5’-TCG-CCT-TAG-ACC-TTG-CGT-TA-3’ and the S200 × 6.1R: 5’-CGC-TTC-AAG-AAT-TCG-AGA-TAC-3’ primers to amplify genomic regions of 479 bp, 249 bp and 223 bp for An. coluzzii, An. gambiae s.s. and An. arabiensis, respectively [44].
PCR targeting dsx intron 4 – exon 5 boundary and Amplicon sequencing
Genomic DNA was extracted from the pooled mosquitoes using the DNAzol protocol and stored at −20 °C. The extracted DNA was then used to perform PCR for the amplification of the intron 4 – exon 5 boundary (2R: 48714420–48714720) of the dsx gene. PCR was performed using KAPA HiFi Hotstart Ready Mix according to the manufacturer’s instructions and the following primers (illumina adapters underlined) previously designed by Hammond and collaborators [21]:
-
Forward: 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCTACTTATCGGCATCAGTTGCG-3’
-
Reverse: 5’-GACTGGAGTTCAGACGTGTGCTCTTCCGATCTGTGAATTCCGTCAGCCAGCA-3’
A small amount of the PCR product was migrated in an electrophoresis gel and visualized under UV to check the quality of the PCR. The remaining PCR products were purified using the Promega Wizard® SV Gel & PCR clean-up system according to the manufacturer’s instructions. The purity and the amount of DNA in each sample were checked in a Nanodrop. The purity threshold was set between 1.8 and 2 (OD260/280) and the DNA samples whose purity was outside this threshold were removed. The purified DNA samples were adjusted to a concentration of 20 ng/ml in a 25 µl of a nuclease-free water and then shipped for sequencing at Genewiz Amplicon-EZ sequencing.
Data analyses
Amplicon sequencing data
Genomic data were received in FASTQ format from the sequencing company. Various tools including fastp, bwa, samtools, IGV and lofreq were used to handle the data for quality checking, trimming, mapping to the reference genome and variant calling. Quality checking and trimming were performed using fastp v0.23.2 [45]. During this step, the DNA sequences whose length is less than 50 bp and Phred score less than 30 were removed from the data. After quality checking and trimming, the genomic data were mapped to the An. gambiae reference genome (AgamP4) [46] using bwa 0.7.17-r1188 [47] and samtools v1.16.6 [48] and stored in bam format. The bam files were indexed and visualized using IGV v17.0.3-internal [49] to check the quality of the mapping to the reference genome. The genetic variants were called from these bam files using the lofreq v2.1.6 program (Wilm et al., 2012) and stored in VCF format [50]. The VCF files were handled in the Jupyter Notebook [52] to analyze the allele frequencies of the identified variants.
The Anopheles gambiae 1000 Genomes Consortium data
We also analyzed the spatiotemporal dynamics of the gene drive target sequence variants using the genomic data of the Ag1000G Consortium project phase 3.0, 3.4 and 3.8. These data were from wild caught An. gambiae (An. coluzzii, An. gambiae s.s. and An. arabiensis) mosquitoes in 19 African countries (Fig. 1). The dataset of Burkina Faso was from time series sampling of wild An. gambiae mosquitoes spanning from 2012 to 2019 collected in three villages (Bana, Pala and Souroukoudinga) (Fig. 5). Full details about the mosquito sampling, sequencing technology, the raw data clean-up and quality control, variant calling, storage of the data and the rights to access these data, have been described on the homepage of MalariaGEN [52]. Python packages such as Malariagen-data [52] and scikit-allel [52] were used to analyze the spatiotemporal distribution of the variants in the gene drive target sequence. We also analyzed the time series variation of the nucleotide diversity (i.e. the average nucleotide differences per site between two randomly selected individuals in the same population) in the gene drive target sequence within the An. gambiae populations from 2012 to 2017.
Data availability
The raw amplicon sequencing data generated and analyzed during the current study are available from the corresponding author upon reasonable request. Jupyter Notebooks and scripts to reproduce all the analyses, tables and figures are available in the GitHub repository: https://github.com/mkient/dsx_works.git. The SNP and haplotype data from the Ag1000G project are available on the MalariaGEN website and can be accessed using the malariagen_data package. The raw sequences in FASTQ format and the aligned sequences in BAM format were stored in the European Nucleotide Archive (ENA, Study Accession n° PRJEB42254).
Change history
31 March 2025
A Correction to this paper has been published: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11515-y
References
WHO. World malaria report 2023. Geneva: World Health Organization; 2023. https://iris.who.int/bitstream/handle/10665/374472/9789240086173‑eng.pdf?sequence=1. Accessed 5 Jan 2024.
McLean KJ, Jacobs-Lorena M. Genetic control of Malaria mosquitoes. Trends Parasitol. 2016;32:174–6.
Burt A. Site-specific selfish genes as tools for the control and genetic engineering of natural populations. Proc R Soc Lond Ser B Biol Sci. 2003;270:921–8.
Hammond A, Galizi R, Kyrou K, Simoni A, Siniscalchi C, Katsanos D, et al. A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nat Biotechnol. 2016;34:78–83.
Hoermann A, Habtewold T, Selvaraj P, Del Corsano G, Capriotti P, Inghilterra MG et al. Gene drive mosquitoes can aid malaria elimination by retarding Plasmodium sporogonic development. Sci Adv. 2022;8:1-9.
Kyrou K, Hammond AM, Galizi R, Kranjc N, Burt A, Beaghton AK, et al. A CRISPR–Cas9 gene drive targeting doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes. Nat Biotechnol. 2018;36:1062–6.
Adelman ZN, Tu Z. Control of Mosquito-Borne Infectious diseases: sex and Gene Drive. Trends Parasitol. 2016;32:219–29.
Hammond AM, Galizi R. Gene drives to fight malaria: current state and future directions. Pathog Glob Health. 2017;111:412–23.
Zhang G, Niu G, Franca CM, Dong Y, Wang X, Butler NS, et al. Anopheles midgut FREP1 mediates plasmodium invasion. J Biol Chem. 2015;290:16490–501.
Dong Y, Simões ML, Marois E, Dimopoulos G. CRISPR/Cas9 -mediated gene knockout of Anopheles gambiae FREP1 suppresses malaria parasite infection. PLOS Pathog. 2018;14:e1006898.
Galizi R, Doyle LA, Menichelli M, Bernardini F, Deredec A, Burt A, et al. A synthetic sex ratio distortion system for the control of the human malaria mosquito. Nat Commun. 2014;5:3977.
Galizi R, Hammond A, Kyrou K, Taxiarchi C, Bernardini F, O’Loughlin SM, et al. A CRISPR-Cas9 sex-ratio distortion system for genetic control. Sci Rep. 2016;6:31139.
Akbari O, Li M, Kandul N, Sun R, Yang T, Benetta ED et al. Targeting sex determination to suppress mosquito populations. Res Sq. 2023;12:RP90199.
Simoni A, Hammond AM, Beaghton AK, Galizi R, Taxiarchi C, Kyrou K, et al. A male-biased sex-distorter gene drive for the human malaria vector Anopheles gambiae. Nat Biotechnol. 2020;38:1054–60.
Kientega M, Kranjc N, Traoré N, Kaboré H, Soma DD, Morianou I, et al. Analysis of the genetic variation of the fruitless gene within the Anopheles gambiae (Diptera: Culicidae) complex populations in Africa. Insects. 2022;13:1048.
Biedler JK, Tu Z. Sex Determination in Mosquitoes. In: Advances in Insect Physiology. 2016;51:37–66.
Oliveira DCSG, Werren JH, Verhulst EC, Giebel JD, Kamping A, Beukeboom LW, et al. Identification and characterization of the doublesex gene of Nasonia. Insect Mol Biol. 2009;18:315–24.
Salvemini M, Mauro U, Lombardo F, Milano A, Zazzaro V, Arcà B, et al. Genomic organization and splicing evolution of the doublesex gene, a Drosophila regulator of sexual differentiation, in the dengue and yellow fever mosquito aedes aegypti. BMC Evol Biol. 2011;11:41.
Scali C, Catteruccia F, Li Q, Crisanti A. Identification of sex-specific transcripts of the Anopheles gambiae doublesex gene. J Exp Biol. 2005;208:3701–9.
Dennison NJ. Sex-specific expression during embryonic development of Anopheles gambiae. University of Liverpool; 2012.
Hammond A, Pollegioni P, Persampieri T, North A, Minuz R, Trusso A, et al. Gene-drive suppression of mosquito populations in large cages as a bridge between lab and field. Nat Commun. 2021;12:4589.
Yadav B, Majhi A, Phagna K, Meena MK, Ram H. Negative regulators of grain yield and mineral contents in rice: potential targets for CRISPR-Cas9-mediated genome editing. Funct Integr Genomics. 2023;23:317.
Unckless RL, Clark AG, Messer PW. Evolution of resistance against CRISPR/Cas9 Gene Drive. Genetics. 2017;205:827–41.
Beaghton AK, Hammond A, Nolan T, Crisanti A, Burt A. Gene drive for population genetic control: non-functional resistance and parental effects. Proc R Soc B Biol Sci. 2019;286:20191586.
Hammond A, Karlsson X, Morianou I, Kyrou K, Beaghton A, Gribble M, et al. Regulating the expression of gene drives is key to increasing their invasive potential and the mitigation of resistance. PLOS Genet. 2021;17:e1009321.
Morianou I. Determining the landscape of resistance to gene drives in the malaria mosquito. Imperial College London; 2023.
Wang G-H, Du J, Chu CY, Madhav M, Hughes GL, Champer J. Symbionts and gene drive: two strategies to combat vector-borne disease. Trends Genet. 2022;38:708–23.
Bier E. Gene drives gaining speed. Nat Rev Genet. 2022;23:5–22.
Lanzaro GC, Campos M, Crepeau M, Cornel A, Estrada A, Gripkey H, et al. Field Trial Site Selection for mosquitoes with Gene Drive: Geographic, Ecological, and Population Genetic considerations. Mosquito Gene drives and the Malaria Eradication Agenda. Jenny Stanford Publishing; 2023. pp. 141–96.
The Anopheles gambiae 1000 Genomes Consortium. Genetic diversity of the African malaria vector Anopheles gambiae. Nature. 2017;552:96–100.
The Anopheles gambiae 1000 Genomes Consortium. Genome variation and population structure among 1142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii. Genome Res. 2020;30:1533–46.
Clarkson CS, Miles A, Harding NJ, O’Reilly AO, Weetman D, Kwiatkowski D, et al. The genetic architecture of target-site resistance to pyrethroid insecticides in the African malaria vectors Anopheles gambiae and Anopheles coluzzii. Mol Ecol. 2021;30:5303–17.
Fuchs S, Garrood WT, Beber A, Hammond A, Galizi R, Gribble M, et al. Resistance to a CRISPR-based gene drive at an evolutionarily conserved site is revealed by mimicking genotype fixation. PLOS Genet. 2021;17:e1009740.
Gantz VM, Bier E. The mutagenic chain reaction: a method for converting heterozygous to homozygous mutations. Sci (80-). 2015;348:442–4.
Hammond AM, Kyrou K, Bruttini M, North A, Galizi R, Karlsson X, et al. The creation and selection of mutations resistant to a gene drive over multiple generations in the malaria mosquito. PLOS Genet. 2017;13:e1007039.
Oberhofer G, Ivy T, Hay BA. Behavior of homing endonuclease gene drives targeting genes required for viability or female fertility with multiplexed guide RNAs. Proc Natl Acad Sci. 2018;115(40):E9343-E9352. https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.1805278115.
Champer J, Liu J, Oh SY, Reeves R, Luthra A, Oakes N, et al. Reducing resistance allele formation in CRISPR gene drive. Proc Natl Acad Sci. 2018;115:5522–7.
Nissim L, Perli SD, Fridkin A, Perez-Pinera P, Lu TK. Multiplexed and programmable regulation of Gene Networks with an Integrated RNA and CRISPR/Cas Toolkit in Human cells. Mol Cell. 2014;54:698–710.
Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, Reyon D, et al. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat Biotechnol. 2014;32:569–76.
Zetsche B, Gootenberg JS, Abudayyeh OO, Slaymaker IM, Makarova KS, Essletzbichler P, et al. Cpf1 is a single RNA-Guided endonuclease of a class 2 CRISPR-Cas System. Cell. 2015;163:759–71.
Sanz Juste S, Okamoto EM, Nguyen C, Feng X. López Del Amo V. Next-generation CRISPR gene-drive systems using Cas12a nuclease. Nat Commun. 2023;14:6388.
Gillies MT, Coetzee M, A SUPPLEMENT TO THE ANOPHELINAE. OF AFRICA SOUTH OF THE SAHARA (AFROTROPICAL REGION). Johannesburg; 1987.
Coetzee M. Key to the females of Afrotropical Anopheles mosquitoes (Diptera: Culicidae). Malar J. 2020;19:70.
Santolamazza F, Mancini E, Simard F, Qi Y, Tu Z, della Torre A. Insertion polymorphisms of SINE200 retrotransposons within speciation islands of Anopheles gambiae molecular forms. Malar J. 2008;7:163.
Chen S, Zhou Y, Chen Y, Gu J, Fastp. An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. Oxford Academic; 2018. pp. i884–90.
VectorBase. Data Set: Anopheles gambiae PEST Genome Sequence and Annotation. The genome sequence of the malaria mosquito Anopheles gambiae. 2022. https://vectorbase.org/vectorbase/app/record/dataset/NCBITAXON_180454. Accessed 7 Jan 2021.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–6.
GATK. VCF : Variant call Format. GATK/Technical Doc. 2012:1–14. https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format
Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J et al. Jupyter Notebooks—a publishing format for reproducible computational workflows. In: Positioning and Power in Academic Publishing: Players, Agents and Agendas - Proceedings of the 20th International Conference on Electronic Publishing, ELPUB 2016. IOS Press BV; 2016. pp. 87–90.
The Anopheles gambiae 1000 Genomes Consortium. Ag3.0 cloud data access — MalariaGEN vector data user guide. 2021. https://malariagen.github.io/vector-data/ag3/cloud.html. Accessed 30 Apr 2022.
Miles A. GitHub - malariagen/malariagen-data-python: A Python package providing functions for accessing and analysing MalariaGEN data. malariagen / malariagen-data-python. 2022. https://github.com/malariagen/malariagen-data-python. Accessed 6 May 2022.
Miles A, Bot P, i. RM, Ralph P, Harding N, Pisupati R et al. Scikit-allel - explore and analyse genetic variation. Zenodo. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.5281/ZENODO.4759368
Acknowledgements
The authors would like to acknowledge the Target Malaria Project, the Institut de Recherche en Sciences de la Santé and their funders for supporting this work. We also acknowledge the Ag1000G Consortium, the MalariaGEN team at the Wellcome Sanger Institute, and their partners in Africa for the production of the An. gambiae genomic data. The authors are grateful to the populations of the sampling sites for their sincere cooperation during the mosquito sampling.
Funding
This work was supported by Target Malaria and the Institut de Recherche en Sciences de la Santé which received core funding from the Bill & Melinda Gates Foundation (INV-006610 and INV-037164), Open Philanthropy (2016-161185) and Wellcome Trust (UNS-128536).
Author information
Authors and Affiliations
Contributions
M.K., N.K. and I.M. conceived the study. A.B. and A.D. provided the resources. A.M.G.B., A.B. and A.D. supervised the study. N.T., H.K., I.M. and M.K. designed methodology and carried out the molecular analyses. A.M., P.S.E., O.N.Z. and F.A.Y. carried out samples collection in the field. N.K. and M.K. carried out data analysis and visualization. I.M. and M.K. drafted the manuscript. All authors have read, edited, and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
All methods in this paper have been implemented in accordance with the relevant guidelines/regulations/legislation in Burkina Faso. No ethics approval was required to run all the activities related to this paper.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: the name of Austin Burt was presented incorrectly.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kientega, M., Morianou, I., Traoré, N. et al. Genomic analyses revealed low genetic variation in the intron-exon boundary of the doublesex gene within the natural populations of An. gambiae s.l. in Burkina Faso. BMC Genomics 25, 1207 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-024-11127-y
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-024-11127-y