Skip to main content

The fate of artificial transgenes in Acanthamoeba castellanii

Abstract

Background

The soil amoeba Acanthamoeba castellanii is an emerging model organism with which to study a wide range of biomedical, microbiological, and evolutionary phenomena. While transformation systems were established for this organism more than two decades ago, the fate of artificial transgenes has not been well characterized. In this study, artificial transformation experiments were performed to investigate how the A. castellanii genome responds to foreign DNA presented in both circular and linear plasmid form.

Results

Nanopore sequencing was used as a high throughput method to screen for transgene DNA in the resulting transformant cultures, and candidate transgene integrations were identified. Molecular biology experiments were performed to validate the sequence data and provide additional context on the fate of transgenes. A method was devised to estimate the rate of read chimerism in nanopore sequencing runs and accurately account for the effects of read chimerism in identifying putative transgene integrations. Based on the experimental data in hand, a potential mechanism for transgene maintenance in A. castellanii is proposed, one in which incoming foreign DNA is tandemly duplicated and telomeres are added to the ends.

Conclusions

Our results suggest that transformation of A. castellanii with foreign DNA leads to linear molecules that are maintained as telomere-containing, transgene-bearing minichromosomes, which may facilitate chromosomal integration. This process may allow lateral gene transfer by expanding the window of opportunity for exogenous DNA to be taken up and integrated into the A. castellanii genome. Similar mechanisms exist in other eukaryote groups, suggesting this may be a widespread feature of eukaryote genome biology.

Peer Review reports

Background

The ability to express transgenes in eukaryotic cells has become a staple of the contemporary molecular genetics toolkit, facilitating a wide array of functional studies that answer genetic, biochemical, and cell biological questions [1, 2]. These experiments often seek to tag, overexpress, or suppress proteins of interest and make inferences about that protein’s role in the organism’s biology. Sophisticated transformation systems and the genetic toolkits built upon them are available for most model organisms, and for most new species we wish to characterize on a molecular and cell biological level, researchers strive to develop a corresponding system for transformation and genetic manipulation [3]. However, while the introduction and expression of transgenes is ubiquitous in molecular biology, relatively few researchers seek to determine the fate of the introduced DNA within the host cell. This is not unjustified; for most experiments of this nature, the fate of the transgene is irrelevant to the research question. For the study of genome biology, though, this knowledge gap contains answers to questions relevant to how eukaryotic organisms maintain their own genomes and respond to the introduction of foreign DNA in nature.

In transformation experiments, evidence suggests that there are two main ways that transgenes are permanently maintained (ignoring transient transformation where the transgenes are expressed and then lost within a short time). The first option is integration into one or more loci on the host chromosomes, where the integrated DNA is faithfully replicated from generation to generation as a part of the genome. Integration into an organellar genome is formally possible, but in experiments where selection relies on expression from nuclear promoters, organellar genome integrants may not persist [4]. The second option for the stable maintenance of a transgene is for the cell to replicate it as an extrachromosomal element, or ‘episome’. Here the transgene undergoes the same replication and segregation processes over time as the host genome but remains physically separate from it. Evidence for both possibilities has been gleaned from transformation studies in a variety of different eukaryotic species, and the two options are not mutually exclusive, even in the same cell [5,6,7]. For chromosomally integrated transgenes, additional questions arise, such as the molecular mechanism(s) of integration and the genomic features at the site of integration that may have influenced the frequency and/or process. Historically, answering such questions relied heavily on classical molecular tools such as polymerase chain reaction (PCR), Southern blotting, restriction fragment analysis, and Sanger sequencing as needed [8, 9].

Acanthamoeba castellanii is a phagotrophic, free-living amoeba commonly found in oxygenated soil and freshwater environments [10, 11], although it is also capable of anaerobic metabolism [12]. A. castellanii is a well-known host to bacterial endosymbionts and viruses [13,14,15,16] and an opportunistic pathogen of humans [17]. A. castellanii is an attractive experimental model microorganism– it is easy to grow axenically on plates and to high density in liquid culture, and it is large enough to facilitate microscopic observation. In terms of its cell biology, A. castellanii is considered to be a useful model for diverse types of eukaryotic cells, including human macrophages. The same cannot be said of its genome, which is complex and poorly understood.

As a facultative pathogen and a ubiquitous inhabitant of natural and artificial environments, A. castellanii strain Neff is one of the few amoebozoans with a published genome sequence. A strain Neff reference genome sequence and annotation was first published in 2013 by Clarke et al. [18], who identified genes involved in processes such as cell signalling, environmental sensing and response, cell adhesion, microbial recognition, antimicrobial defense, metabolism, and transcription regulation. They also performed a genome-wide screen for foreign genes derived from lateral gene transfer. In 2022, new sequencing technologies were employed to produce chromosome-scale reference genome sequences for A. castellanii strains Neff and C3 [19]. The greatly enhanced structural resolution afforded by this approach revealed a karyotype of 35 unique chromosomes for both strains.

Beyond the sequence and structure of the Acanthamoeba nuclear genome itself, researchers have made inroads into understanding its nuclear biology. In 1986, Byers [20] measured nuclear DNA content, proposing a now-widely-cited estimate of 25n for Acanthamoeba’s ploidy. Regardless of the precise genome copy number, the organism is widely understood to be polyploid and further studies employing pulsed-field gel electrophoresis (PFGE) suggested that Acanthamoeba genomes may also be aneuploid [21]. Genes encoding parts of the meiotic machinery have been detected in A. castellanii [22], although no observation of sexual processes has been reported.

Here we sought to characterize the fate of artificially introduced transgenes in the amoebozoan protist Acanthamoeba castellanii using a transformation protocol developed by Bateman and colleagues [23, 24]. In this study, Oxford Nanopore long-read sequencing was used to perform a high-throughput, genome-wide search for transgenes in A. castellanii, with traditional molecular biological methods used for confirmation and additional characterization.

Methods

Culturing and transformation of Acanthamoeba castellanii

All cultures of A. castellanii strain Neff used in this study were grown at room temperature in Neff base medium with additives (ATCC Medium 712; 0.75% yeast extract, 0.75% proteose peptone, 2 mM KH2PO4, 1 mM MgSO4, 1.5% glucose, 0.1 mM ferric citrate, 0.05 mM CaCl2, 1 µg/mL thiamine, 0.2 µg/mL D-biotin, and 1 ng/mL vitamin B12). Transformed culture media also contained the antibiotic G418 at a concentration of 10 µg/mL during initial culture establishment after transformation, and 50 µg/mL for full-strength selection and long-term maintenance.

All transformation experiments were based on the method described by Peng, Omaruddin, and Bateman [23], and further developed by Bateman [24]. Our general adaptation of this protocol is available online at protocols.io [25]. The plasmid pGAPDH-EGFP, described by Bateman [24], was used for these experiments (Fig. 1; see Additional file 1 for plasmid sequence). This plasmid encodes enhanced green fluorescent protein (EGFP) expressed from the endogenous A. castellanii glyceraldehyde-3-phosphate dehydrogenase (GAPDH) promoter, and a neomycin resistance marker (neoR) expressed from the endogenous A. castellanii TATA-box binding protein (TBP) promoter. Briefly, the transformation method uses the QIAGEN SuperFect reagent and introduces 4 to 6 µg of plasmid to a sample containing roughly 500 000 cells.

Fig. 1
figure 1

Plasmid pGAPDH-EGFP used for transformation of A. castellanii. The plasmid has an ampicillin resistance marker for propagation in Escherichia coli, neomycin resistance marker for selection in A. castellanii, and enhanced green fluorescent protein (EGFP) as a reporter gene. The neomycin resistance marker and EGFP are expressed from endogenous A. castellanii promoters. Expression of the resistance marker is driven by the TATA-box binding protein (TBP) promoter, while EGFP expression is driven by the glyceraldehyde-3-phosphate dehydrogenase (GAPDH) promoter. Restriction sites used in this study are marked on the plasmid map

The first transformation was performed using pGAPDH-EGFP in circular form. The initial resulting culture was first investigated without establishing monoclonal lines and is referred to as the ‘mixed transformant’ sample due to the fact that it could be clonally heterogeneous. Single amoeba cells were subsequently isolated by agar plate migration and used to establish monoclonal cultures from the mixed population of transformants [26]. Another transformation experiment was performed as described above, except with plasmid linearized by a single cut with the restriction endonuclease PsiI, and monoclonal cultures were established prior to any analysis taking place.

Microscopy

A 50 µL aliquot of ‘mixed transformant’ culture was applied to a glass microscope slide. Cells were given 15 min to adhere to the slide, and then the media was removed. The cells were fixed with 4% formaldehyde for 15 min, then permeabilized with 0.5% Triton X-100 for 15 min. The sample was stained with 300 nM DAPI solution for 5 min, then the slide was washed three times with phosphate-buffered saline. A coverslip was mounted to the slide with Sigma-Aldrich Fluoromount mounting medium. The cells were imaged with a Zeiss AxioVert 200 M epifluorescence microscope using the GFP and DAPI filters. The total magnification used for capturing the images was 630X.

Nanopore sequencing of transformed A. castellanii cultures

Once the transformed cells had recovered and the stable ‘mixed transformants’ culture was established, genomic DNA was extracted using an SDS-based lysis followed by phenol-chloroform extraction. DNA samples were cleaned with QIAGEN G/20 genomic clean-up columns using the manufacturer’s protocol, but with double the number of wash steps. A nanopore sequencing library was prepared for the Oxford Nanopore MinION using the SQK-LSK108 ligation sequencing kit and sequenced on a FLO-MIN106 flow cell. The reads were basecalled with Albacore v2.1.7.

Three clones from the transformation with circular pGAPDH-EGFP were selected for nanopore sequencing, hereafter referred to as ‘Clone 1’, ‘Clone 5’, and ‘Clone 8’, based on their original labels at the time of single cell isolation. DNA was extracted from clonal cultures as described above. A barcoded sequencing library was prepared using the ligation sequencing kit (SQK-LSK109) and the native barcoding kit (EXP-NBD104) with barcodes 2, 5, and 8. The library was run on a FLO-MIN106 flow cell and basecalled with Albacore v2.3.3.

A subsequent sequencing run using only Clone 1 was sequenced with the ligation sequencing kit (SQK-LSK109) on a FLO-MIN106 flow cell. The raw output was basecalled with Guppy v3.1.5 using the HAC (high accuracy) basecalling model.

Three of the clonal isolates from the transformation with linearized pGAPDH-EGFP were sequenced and analyzed using the same library preparation and bioinformatic approach as Clones 1, 5, and 8 described above, but basecalling was done with Guppy v5.1.13 using the SUP (super accurate) model. The three clonal isolates analyzed from this transformation experiment are hereafter referred to as ‘Clone LT6’, ‘Clone LT8’, and ‘Clone LT9’. Accordingly, the barcodes used for this sequencing run were barcode 6, 8, and 9, respectively.

Plasmid-containing reads from each of these sequencing datasets were identified using BLASTn v2.7.1 and retrieved. They were mapped against the wild-type genome using minimap2 [27] v2.24, with soft clipping allowed. Mapped reads were then visualized in Integrative Genomics Viewer [28, 29] to locate putative chromosomal integrations of the plasmid, and BLASTn v2.7.1 was used to confirm that the soft-clipped portions of these reads were plasmid-derived. A simple text search was performed against the plasmid-containing reads to search for telomeric repeats.

Determining the rate of artefactual read chimerism in transformant sequence data

The background rate of read chimerism was estimated from each sequencing run used to infer integration. This was done using the non-plasmid-containing reads, so that the calculated rate of chimerism could be applied to assess the validity of the plasmid + genome reads. Reads from an individual sequencing run were mapped against the wild-type reference genome sequence with minimap2 [27] v2.24 and the mapping was output as a paf file rather than a SAM file. All reads that had exactly two mappings to the reference and no plasmid sequence were retained for further analysis, because depending on the genomic location of these two mappings, reads with this mapping pattern could be chimeras from two different genomic loci.

A custom Perl script (https://github.com/morgancolp/Acastellanii_transgene_analysis) was used to identify putatively chimeric reads. This script assessed the reads with two mappings to the reference genome as follows: if the two mappings for a read were found to be greater than 500 bp apart on the genome, each was more than 100 bp away from the end of a scaffold, and the distance between the two mappings on the genome was at least 100 bp more than the distance between the two mappings on the read, the read was considered to be chimeric. To estimate the proportion of chimerism in each sequencing run, the number of putative chimeric reads was divided by the total number of reads fed into the chimerism identification script.

The number of putative chromosomal integrations of the plasmid that could have been artefacts due to read chimerism was estimated using this proportion. The total number of plasmid-containing reads was multiplied by the proportion of reads expected to be chimeric, and the resulting value represented the expected number of reads to artefactually appear as putative integrations due to chimerism. This number was compared to the observed number of putative integrations from the same read set. This process was done independently for the output of each sequencing experiment to account for differing rates of chimerism across different sequencing runs.

Illumina short-read re-sequencing and analysis of A. castellanii clones

Genomic DNA was extracted from wild-type A. castellanii as well as clones LT6 and LT9 and sent to the Integrated Microbiome Resource at Dalhousie University for Illumina library preparation and sequencing. The three samples were sequenced as part of a larger multiplexed run on an Illumina NextSeq2000 instrument. 150-bp paired-end reads were generated and processed with Trimmomatic [30] v0.36 using the following trimming parameters: HEADCROP:10, LEADING:12, SLIDINGWINDOW:4:20, MINLEN:75.

Illumina reads from Clone LT6 and Clone LT9 were used to seek confirmation of putative integrations inferred from nanopore sequence data. The Illumina reads from each respective clone were mapped against the long reads from the same clone using HISAT2 [31] v2.2.1 with default mismatch penalties, as well as with the maximum and minimum mismatch penalties decreased from 6 and 2 to 5 and 1, respectively. Mapped reads were visualized in Integrative Genomics Viewer [28, 29] to look for reads mapping across the putative plasmid-genome junctions. Illumina reads were also compared against the putative integration-supporting long reads using BLASTn v2.7.1 to see whether any Illumina reads showed BLAST hits that spanned the junctions.

Southern blot analysis to locate transgene DNA

A Southern blot experiment was performed on DNA from the ‘mixed transformant’ culture to detect transgenes. The following samples were prepared ahead of the Southern blot experiment: undigested genomic DNA (gDNA) from the A. castellanii culture of ‘mixed transformants’, two aliquots from this gDNA that were digested with either BglII or HindIII, purified pGAPDH-EGFP plasmid, and two aliquots of the plasmid digested with the same restriction endonucleases. These samples were run on a 1% agarose gel alongside Thermo Scientific GeneRuler 1 kb plus ladder, and the DNA was transferred to a positively charged nylon membrane following the Southern blotting protocol published by ThermoFisher (https://assets.fishersci.com/TFS-Assets/LSG/manuals/MAN0013296_Southern_Blotting._Dot_Blotting_UG.pdf), which is a typical alkaline transfer method.

The probe for this experiment was generated using the Thermo Scientific Biotin DecaLabel DNA labelling kit. In this case, the probe was generated from a 345-bp segment of the neoR gene on the pGAPDH-EGFP plasmid. The fragment serving as the substrate for the labelling reaction was generated by polymerase chain reaction (PCR) with pGAPDH-EGFP as template using the following primers: Forward: 5´- CTGCCGAGAAAGTATCCATC − 3´, Reverse: 5´- CCAACGCTATGTCCTGATAG − 3´.

Hybridization of this probe to the membrane also followed the ThermoFisher Southern blotting protocol. Membrane blocking was done with salmon sperm DNA at a concentration of 50 µg/mL. For the final hybridization, the probe was added to a concentration of 50 ng/mL. Hybridization was performed for 12 h at 42 °C with gentle agitation. The membrane was then washed twice with 2X saline sodium citrate buffer (SSC) + 0.1% SDS for 10 min each time at room temperature and washed for high stringency in 0.1X SSC + 0.1% SDS for 10 min each time at 65 °C. The hybridizing probe was detected using the Thermo Scientific Biotin Chromogenic Detection kit. The detection reaction was performed according to the manufacturer’s protocol, allowing the colour to develop overnight.

Another Southern blot experiment was performed on DNA from the Clone LT9 culture to look for evidence of transgenes on an episome. The same protocol and choice of probe were used as above, but the input DNA and gel were different. Genomic DNA from wild-type A. castellanii and from Clone LT9 were used, as well as purified pGAPDH-EGFP plasmid. For each sample, one aliquot was left untreated, while a second aliquot was simultaneously digested with the restriction endonucleases NheI, NotI, and SacI. The samples were run on a 0.5% agarose gel (to resolve large DNA fragments better than a typical 1% agarose gel) along with Thermo Scientific GeneRuler 1 Kb ladder. The DNA was visualized with ethidium bromide, and a Southern blot experiment was performed using the same protocol as described above.

Pulsed-field gel electrophoresis for detecting potential episomes

A pulsed-field gel electrophoresis (PFGE) experiment was performed on wild-type and Clone LT9 genomic DNA, with one ladder of Bio-Rad CHEF DNA standard lambda concatemers (48.5 Kbp to 1 000 Kbp) as well as the Thermo Scientific GeneRuler 1 kb ladder. One lane contained both ladder types. The PFGE was run on a Bio-Rad CHEF-DR III system and used a 1% agarose gel, 0.5X TBE running buffer, an 18-hour run time, a 120 ° angle, and a switch time ramping from 1 s to 14 s over the course of the run. The gel was stained with ethidium bromide to visualize the DNA.

Polymerase chain reaction verification of chromosomal integration of transgenes

Four putative integration loci were selected for verification with PCR, the goal being to amplify across the plasmid-genome junctions on each end of a putative integration. The first three putative integrations were selected at random, and the fourth was selected because it was represented by the most reads in the sequence data (see Additional file 2 for a schematic showing the four putative integration loci and associated primer sites). Where possible, long reads spanning putative integrations were used to guide primer design. To account for inaccuracy in the raw nanopore read sequence, prospective primer sites were cross-referenced against the corresponding sequences from the genome assembly and the reference plasmid sequence and adjusted accordingly. At least one long read was available to inform the choice of primer sites for both junctions of two of the chosen loci, and only one junction of the other two loci. Primer pairs were designed to amplify between 500 and 1 000 bp, with the exact length dictated by optimizing primer characteristics. For junctions not captured in the nanopore reads from a given transformant clone but predicted to exist at the opposite end of a putative integration, two or three primers were designed to both the plasmid and the genome sides to account for potential small deletions. These primers were spaced roughly 50 to 100 bp from one another, while still aiming for a 500-1 000 bp amplicon.

Early PCR experiments used NEB One Taq polymerase in its Quick-Load 2X master mix with standard buffer. These experiments started with the following PCR program: 96 °C for 5 min, then 25 cycles of denaturation at 96 °C for 30 s, annealing at 50 °C for 30 s, and extension at 72 °C for 1 min, then a final 7-minute elongation step at 72 °C. To adjust for poor amplification, the number of cycles was then increased to 35. Then, to reduce the generation of non-specific product, the annealing temperature was increased to 56 °C.

One of the putative transgene integration loci showed promising results and PCR was further optimized using ThermoFisher Platinum II Taq Hot-Start DNA polymerase. The optional GC enhancer reagent was included into the standard amplification recipe recommended for this polymerase. The PCR program was as follows: an initial denaturation step of 2 min at 94 °C, followed by 35 cycles of denaturation at 94 °C for 15 s, annealing at 60 °C for 15 s, and extension at 68 °C for 15 s. There was no long final extension step in this protocol. Bands of interest were gel extracted using a Macherey-Nagel NucleoSpin Gel and PCR Clean-up kit, and samples were sent to GeneWiz, South Plainfield, New Jersey for Sanger sequencing.

Results

Stable transformation of Acanthamoeba castellanii

A. castellanii strain Neff was transformed with the plasmid pGAPDH-EGFP which encodes EGFP and neoR from endogenous A. castellanii promoters. After recovery from transformation, the transformed cell population under selection with 50 µg/mL G418 exhibited a growth rate comparable to that of a wild-type culture with no selection. Upon examination with epifluorescence microscopy, cells expressing EGFP were clearly visible (Fig. 2). In this initial transformation experiment, the circular form of the plasmid was used. Strong nuclear localization of the fluorescent protein was apparent. Nuclear localization of EGFP has been observed in mammalian cell lines as well; EGFP monomers are smaller than the predicted diffusion limit of proteins into the nucleus and therefore are thought to passively diffuse from the cytosol into the nucleus via nuclear pores [32, 33]. This nuclear concentration of EGFP does not obviously diminish reproductive fitness of our transformed cells; other biological impacts are unknown.

Fig. 2
figure 2

A. castellanii cells transfected with pGAPDH-EGFP. A. Cells were imaged using fluorescence microscopy with a DAPI filter to visualize DNA stained with DAPI. B. Cells were imaged using fluorescence microscopy using a GFP filter to visualize EGFP expression. C. A merge of the DAPI and GFP images shown. All cells were imaged using 630X total magnification. All scale bars represent 20 μm

Plasmid sequence can be detected by sequencing transformants

We used Oxford Nanopore long-read sequencing as a high throughput screening tool for the detection of plasmid DNA in A. castellanii transformants. The sequence output corresponding to transformants stemming from transformation of circular plasmid DNA is summarized in Table 1. The goal was to look for the presence of transgene DNA in transformed cultures and to determine in what genomic context it was being maintained. BLAST searches with the pGAPDH-EGFP sequence as query allowed retrieval of all long reads corresponding to internalized plasmids. In the case of putative chromosomal integration, these were long reads containing both plasmid and genomic sequence, where at least one junction was represented. In our first experiment (‘Mixed transformants’ in Table 1), 9 579 reads were detected with identifiable plasmid sequence, of which 118 were found to have (TTAGGG)n repeats or the reverse complement (CCCTAA)n on one end; a single read was found with repeats on both ends. These repeats match the TTAGGG telomeric repeat unit known to be used by A. castellanii and therefore bear resemblance to functional telomeres.

Read mapping against the A. castellanii reference genome identified 33 reads that appeared to contain both genomic and plasmid sequence and thus could correspond to chromosomal integrants. Notably, many of the plasmid-mapping reads contained tandem repeats of the single-copy plasmid sequence used in our transformation experiments, in a variety of different flanking sequence contexts. Arrays of up to 11 copies of the plasmid were observed on nanopore reads more than 65 Kbp in length; the existence of even longer arrays may have been obscured by read length limitations.

Table 1 Sequencing statistics for all nanopore long-read sequencing runs performed on A. castellanii transformants in this study

The Clone 1 (shallow) read set was barcoded with Clone 5 and Clone 8 in the same run, while Clone 1 (deep) came from a separate run where Clone 1 was the only sample. Clones LT6, LT8, and LT9 were barcoded together and sequenced in one run. The mixed transformants sample was the only sample in its respective sequencing run. All the runs used FLO-MIN106 flow cells, and all used the SQK-LSK109 sequencing kit, except the mixed transformants which used the previous version, SQK-LSK108. The mixed transformants and Clones 1, 5, and 8 were transformed with circular plasmid, while Clones LT6, LT8, and LT9 were transformed with a linearized form of the same plasmid.

Southern hybridization identifies transgenes on high molecular weight species

To gain additional perspective on the fate of transgenes in this first mixed population of transformants, genomic DNA was extracted and run on an agarose gel in three forms: untreated, digested with BglII, or digested with HindIII. Both 6 base-pair restriction endonucleases cut the plasmid only once (Fig. 1). This gel was subsequently used for a Southern blot with a probe against the plasmid-borne neomycin resistance marker gene, neoR. This experiment aimed to visualize small molecules (under the ~ 30 Kbp resolution limit of a 1% agarose gel) in the undigested DNA sample that could correspond to a stably inherited episomal transgene. Transgene DNA could be detected in the undigested DNA lane (Fig. 3; Additional file 3); a single high molecular weight (HMW) band to which the probe hybridizes was seen to migrate higher than the top (20 Kbp) ladder band. In the two genomic DNA digest lanes, a variety of restriction products are apparent, but the Southern hybridization shows very strong signal to a band of a similar size as the digested plasmid control, with additional weak signal to bands slightly larger and smaller.

Fig. 3
figure 3

Southern blot detection of the neoR gene in transformed A. castellanii genomic DNA. The agarose gel on the left was used for the Southern blot on the right. Genomic DNA (gDNA) from A. castellanii transformed with circular pGAPDH-EGFP was run undigested (U) and two aliquots from the same sample were digested with each of two different restriction endonucleases, BglII (B) and HindIII (H). These endonucleases each have a 6 bp recognition site that occurs once within the plasmid. The purified pGAPDH-EGFP plasmid was run as a control, with the same three treatments. The ladder in the leftmost lane is Thermo Scientific GeneRuler 1 kb plus DNA ladder, with select band sizes marked alongside. Bright, diffuse nucleic acid signals can be observed at the bottom of each genomic DNA lane; these signals are thought to result from degraded and low molecular weight RNA that was not completely removed during DNA extraction. The full-length blot and gel are presented in Additional file 3

Nanopore sequencing of monoclonal transformant lines

Three monoclonal transformant isolates were selected for DNA extraction and nanopore sequencing. These three samples were barcoded and sequenced together in a single MinION run (Table 1; ‘Clone 1 shallow’, ‘Clone 5’, ‘Clone 8’). To screen for potential chromosomal integrations of the transgenes, all nanopore reads with hits to the full plasmid sequence were mapped against the wild-type reference genome. Read mapping was done using the soft clipping option, which allowed putative genome-plasmid junctions to be visualized using the Integrative Genomics Viewer [28] and then verifying that the clipped part of the read was indeed plasmid-derived. One of these monoclonal isolates, ‘Clone 1’, was chosen for repeat DNA extraction and sequencing in the absence of other barcoded samples (Table 1; ‘Clone 1 deep’). The resulting data were analyzed similarly. One putative integration locus on Chromosome 5 was found to be completely spanned by seven nanopore reads, which extend into the unique genomic sequence on both sides. The integrated DNA was found to be a small, 365 bp fragment of the plasmid, corresponding to the protein coding sequence for EGFP. Many of the reads demonstrating putative integrations in these monoclonal cultures contained tandem repeats of the plasmid sequence similar to those described above from previous sequencing experiments. The plasmid repeat units of these arrays were typically all oriented in the same direction, but occasionally there were several repeats in one direction, then a segment of fragmented plasmid sequence, then a few intact repeats in the opposite direction.

PCR confirmation of a single putative transgene integration

Four putative integration loci in Clone 1 were selected for PCR verification; three chosen at random and a fourth by virtue of the fact that the putative integration site was fully spanned by seven nanopore reads. Only this latter putative integration could be confirmed, and only on one end of the integrated plasmid array. Amplified products (Additional file 4 and Additional file 5) from this successful reaction were sent to GeneWiz for Sanger sequencing which confirmed their identity and therefore confirmed chromosomal integration of the transgene. The integrated fragment is a 365-bp segment of the gene that encodes EGFP on the transforming plasmid, now residing on Chromosome 5 in the transformed Clone 1 line. For the plasmid-genome junctions on the other end of this locus and both ends of the other three loci, PCR experiments either showed no amplification or amplification of non-specific products, depending on the number of PCR cycles, amount of DNA template, and stringency of the annealing temperature.

Transformation of A. castellanii with linearized plasmid

To test whether rolling-circle replication was involved in the formation of plasmid concatemers, the transformation experiment was repeated with plasmid that had been linearized by the restriction endonuclease PsiI, which cuts a single time in the pGAPDH-EGFP plasmid (Fig. 1). Monoclonal isolates were derived from the population of successful transformants as described. DNA was extracted from three clones, and all three samples were barcoded and sequenced together in the same nanopore sequencing experiment (Table 1; ‘Clone LT6’, ‘Clone LT8’, ‘Clone LT9’). The number of inferred integrations from the deeply sequenced clone transformed with circular plasmid (Clone 1) and the three clones transformed with the linearized plasmid was surprisingly large: 1 251 putative integrations in the clone transformed with circular plasmid, and 280, 187, and 109 inferred integrations in the three clones transformed with linearized plasmid (Table 2). The sequence data revealed the existence of tandem arrays of the plasmid sequence regardless of which plasmid topology was used for transformation. Like in the cultures transformed with circular plasmid, the repeat units of these arrays were generally all oriented in the same direction.

Estimating the rate of chimerism in nanopore sequencing experiments

To evaluate how many putative plasmid-genome junctions were the result of read chimerism, the background rate of read chimerism in transformant sequencing datasets was estimated and applied to the total number of plasmid-containing reads. This analysis showed that a strikingly high number of the putative chromosomal integrations of the transgenes could potentially be artefacts generated by read chimerism. Of the 1 251 inferred integrations in Clone 1, read chimerism could be responsible for as many as 936 of them. In Clone LT6 where 280 integrations were inferred, read chimerism was calculated to explain up to 336, in Clone LT8 up to 438 putative integrations could be explained by chimerism while only 187 were inferred, and in Clone LT9, 185 putative integrations could be chimeric, while only 109 were inferred.

To assess the reliability of nanopore sequence-based putative integrations from another angle, putative integration loci that were supported by two or more nanopore reads were identified. Consistent with the read chimerism findings, a low number of putative integrations with more than one read supporting them were found (Table 2). In Clones 1, LT6, LT8, and LT9 respectively, there were 25, 0, 2, and 1 inferred integrations with 2 or more reads supporting them.

Table 2 Summary of nanopore sequence data bearing on the fate of transgenes from four A. castellanii transformants

Plasmid topology refers to whether the clone was transformed with circular pGAPDH-EGFP or pGAPDH-EGFP linearized by PsiI. Inferred integrations are loci where one or more nanopore reads contain genomic sequence from that locus joined to plasmid sequence. The predicted number of artefactually chimeric junctions is determined by calculating the background rate of chimeric reads in a sequencing run and applying this rate to the number of plasmid-containing reads to estimate how many may be chimeric. The reads with one or two telomeres refer to reads where one or more plasmid copies are found in a read with telomeric repeats on one or both ends.

Illumina sequencing transformed clones reveals extremely high plasmid sequence abundance

Given the uncertainty surrounding nanopore-based evidence for transgene integration, the genomes of two independent A. castellanii clones were deeply sequenced using Illumina short read technology. For Clones LT6 and LT9, ~ 233.5 million and ~ 449.7 million reads were obtained, respectively (30.2 Gbp and 56.2 Gbp of sequence, corresponding to 485X and 910X genomic coverage; Table 3). Two strategies were used to search for putative transgene integrations in these data. First, the set of long reads representing putative integrations were collected into a single file for each clone, and the corresponding Illumina read set was mapped against these putative integrations. The data were then visualized and investigated for short reads mapping across putative plasmid-genome junctions. The second method involved BLAST comparisons of the plasmid, the putative LT6 and LT9 integration reads, and the total set of LT6 and LT9 Illumina reads.

Neither method identified any Illumina reads supporting the putative integrations of plasmid DNA into A. castellanii chromosomes. To test the hypothesis that the Illumina datasets may be artifactually depleted in plasmid sequence, leading to the observed lack of evidence for integration, the LT6 and LT9 Illumina read sets were mapped onto the sequence of the pGAPDH-EGFP plasmid. The results (Table 3) show that the plasmid was present in extremely high abundance in both LT6 and LT9 cells, albeit in the absence of evidence for how it was being maintained.

Table 3 Read coverage depth of the genome and plasmid in clones LT6 and LT9

Coverage was estimated by mapping both Illumina and nanopore reads against the wild-type A. castellanii Neff genome and against the pGAPDH-EGFP sequence. Coverage values have been rounded to the nearest 5 for genome coverage and to two significant figures for the plasmid coverage for ease of comparison.

Potential transgene minichromosome in A. castellanii

To better understand the ultra-high sequence coverage of plasmid DNA in our nanopore and Illumina sequence data, nanopore reads from all the transformant clones were searched for reads that had one or more plasmid copies flanked on both ends with telomeres, such that they represented a transgene-containing ‘minichromosome’. At least one was found in each of the deeply sequenced transformant clones, ranging from about 30 Kbp to 60 Kbp in length, between 5 and 9 plasmid copies in tandem. Since the actual counts of telomere-flanked reads were very low, all plasmid arrays with telomeres on at least one end were identified to account for the possibility that most reads were not long enough to capture both telomeres. The tabulated counts of these reads for each clone are presented in Table 2.

We next sought to visualize the hypothesized transgene minichromosome using gel electrophoresis and Southern hybridization. In this experiment, rather than using two different restriction enzyme digests, we aimed to degrade away as much wild-type genomic sequence as possible without cutting plasmid arrays. For this, SacI, NheI, and NotI were chosen due to each one having at least a few thousand cut sites in the wild-type Neff genome, and none in the pGAPDH-EGFP sequence. More specifically, NheI cuts a predicted 2 565 times in the 35 chromosome-scale scaffolds of the wild-type Neff genome [19], for an expected average of roughly 17 Kbp between cut sites, while NotI cuts a predicted 3 790 times in those scaffolds for an expected average of roughly 11 Kbp between cuts, and SacI cuts a predicted 24 049 times in those 35 scaffolds for an expected average of roughly 2 Kbp between cuts. A 0.5% agarose gel (for improved resolution) was run with wild-type Neff gDNA, LT9 gDNA, and pGAPDH-EGFP. Each sample was run in undigested form as well as digested with the cocktail of restriction enzymes described above. A Southern blot was again performed with a probe against the neoR gene from the plasmid (Fig. 4; Additional file 6).

Fig. 4
figure 4

A Southern blot to detect extrachromosomal transgenes in transformed A. castellanii genomic DNA. The agarose gel (left) was used for a Southern blot / hybridization (right). Genomic DNA from wild-type (WT) A. castellanii strain Neff and Clone LT9 was run undigested (U) and digested with SacI, NheI, and NotI (D). These endonucleases cut frequently within the wild-type genome and do not cut within the plasmid. The purified pGAPDH-EGFP plasmid (P) was run as a control, with the same treatments. The ladder in the leftmost lane is Thermo Scientific GeneRuler 1 kb DNA ladder, with select band sizes marked alongside. An arrowhead marks a band that appears in undigested LT9 but not wild-type. Bright, diffuse nucleic acid signals can be observed at the bottom of each genomic DNA lane; these signals are thought to result from degraded and low molecular weight RNA that was not completely removed during DNA extraction. The full-length blot and gel are presented in Additional file 6

From the gel (Fig. 4), the WT and LT9 digested lanes appear to have approximately the same banding pattern. Interestingly, in the LT9 undigested gDNA lane there is a distinct extra band appearing just below the combined HMW DNA band. Given the expectation for the transformants to have a 30 to 60 Kbp molecule bearing the transgenes, this distinct extra band would appear to be a candidate, but the Southern blot shows no hybridization of the neoR probe to this extra band, instead hybridizing to the high molecular weight DNA band. This extra band thus appears not to be the hypothesized linear episome, but its identity is not clear and could not be determined by revisiting the sequence data from this clone, nor from PFGE of the undigested wild-type and LT9 samples (Additional file 7).

Discussion

The goal of this study was to determine the fate of transgenes in Acanthamoeba castellanii after artificial transformation experiments with circular and linearized forms of the same plasmid. Nanopore and Illumina sequencing, PCR, Southern blotting, and PFGE were all used to determine whether the transgenes were episomal or chromosomally integrated, and if any modifications to them had taken place. The results are complex, but on balance favour the episomal maintenance of the transgenes in A. castellanii on a linear molecule that contains a tandem array of plasmid sequence flanked by telomeric repeats.

Our results unambiguously indicate that A. castellanii maintains the transgenes in tandem arrays of between 5 and 10 plasmid copies on average. It is unclear whether this size range would consistently apply to any future transformation experiments on this organism with varying parameters, nor to natural acquisition of foreign DNA in either circular or linear form. Distinguishing between chromosomal or episomal maintenance of transgenes was not possible using the array of experimental approaches employed. Using the size of the transgene-containing DNA species as a diagnostic criterion was surprisingly inconclusive, even using standard DNA sizing methods like agarose gel electrophoresis. The size of a putative episome, inferred to be in the range of 30 Kbp to 60 Kbp based on sequence data, is much different than the size of most of the nuclear chromosomes in A. castellanii [19], although a couple of chromosomes at the lower end of the size range are around 100 Kbp. However, 30–60 Kbp is still above what can easily be resolved on a standard agarose gel. Therefore, while Southern blot experiments placed the transgenes in the same compressed, HMW band on as the genomic DNA, the episome may not be expected to resolve as a separate band on such gels.

Peng, Omaruddin, and Bateman [23] arrived at a similar conclusion in their original description of the transformation protocol for A. castellanii, which used the circular form of the plasmid. Using Southern hybridization, they concluded that the transgenes must be on a molecule of at least 12 Kbp in size and present in several copies, but could not differentiate between chromosomal integration, the formation of a linear episome, or a circular molecule composed of multiple plasmid concatemers. Use of PFGE to try and resolve an episomal band from chromosomal DNA provided no additional insight; no clear band in the size range that would be expected based on the sequence data alone could be observed (Additional file 7). Southern hybridization against PFGE-resolved gels with probes against telomeres and the neoR resistance marker might provide more information. Use of electron microscopy to directly observe the putative linear episome is another possible line of inquiry.

Overall, sequence-based analysis arguably shifts the balance of probabilities in favour of episomal maintenance of transgenes in A. castellanii rather than strictly chromosomal integration, for several reasons. The first is that the evidence for the latter is unconvincing. The discrepancy between the read support of any given putative integration and the overall genome coverage demands explanation. Depending on the specific integration and the transformed strain analyzed, this could be a 25- to 100-fold difference; while putative plasmid integrations were supported by only one or two reads, the genome coverage of the clones ranged from 50X to 100X. The explanation held throughout much of this investigation was that the suspected polyploidy of A. castellanii, estimated to be ~ 25n by Byers [20], could account for this discrepancy if an integration took place in only one copy of a chromosome. However, finding a single supporting read for an integration in a sequence dataset of 100X genome coverage makes even this scenario unlikely.

This explanation (i.e., transgene integration into one or a few chromosomal copies) is further weakened by the ultra-high abundance of plasmid sequence in both Illumina and nanopore sequence data: the coverage of a single copy of the plasmid was about 250 to 300 times higher than the genome coverage in clones LT6 and LT9 (Table 3). Even giving the integration hypothesis maximum ‘benefit of the doubt’ does not account for this finding; assuming each of the 280 putative integrations in LT6 had an array of 11 plasmid units (the longest array found in any of the data sets), this explains 3 080 total copies of the basic plasmid unit, which is only 62 times higher than the overall genome coverage in the corresponding read set, falling short of the observed 250 to 300X coverage.

Consideration of read chimerism further weakens this hypothesis. Based on the background rate of chimeric reads estimated in the LT6 nanopore sequencing run, chimerism could explain the appearance of up to 336 reads corresponding to artefactual junctions between plasmid and genomic sequence, which exceeds the observed number. It should be noted that the criteria used to identify chimeric reads were relatively stringent; one could propose situations of true chimerism that would be rejected by the criteria implemented. This means that the estimated rate of chimeric reads, if inaccurate, would likely be an underestimate.

The presence of telomere-flanked plasmid arrays in the nanopore reads obtained from multiple transformed clones provides evidence for the existence of a transgene that could be replicated and maintained as an episome. This hypothesis shares some of the weaknesses of the integration hypothesis, but to a lesser extent. It is difficult to explain our finding of three or fewer reads in each clone where a plasmid array is flanked by telomeres on both ends. Read length limitations may restrict how often reads with two telomeres are observed, given the size of the putative episome is predicted to be between 30 and 60 Kbp which is larger than the average read length of the nanopore sequencing runs performed for this study. Detection of telomere-bearing reads may have also inhibited by a known tendency for telomeric repeats to be frequently mis-called in nanopore sequencing [34]. Since an exact match was used to screen for telomeric repeats in our analysis, these sequencing errors may have resulted in false negatives. However, we required just two consecutive telomeric repeats to be present in order to retain a read for further screening, which presumably would have allowed us to capture all but the most extreme cases of mis-calling.

Even when accounting for reads with telomeric repeats on only one end, the counts are quite low compared to total plasmid sequence abundance. However, the issue of read chimerism is less problematic for this hypothesis. For chimerism to account for all plasmid-telomere junctions observed in each clone, each chromosome would be expected to have telomeres in excess of 100 Kbp on each end. This is based on using the estimated rate of chimerism in each clone to extrapolate how much total telomeric sequence would be needed for all observed plasmid-telomere junctions to be chimeric, then dividing this by the number of chromosome ends. For comparison, mammalian telomeres are typically from 10 to 30 Kbp [35], and yeast telomeres are only a few hundred base pairs in length [36]. This would also make each of A. castellanii’s telomeres at least the same length as the total predicted non-telomeric sequence of chromosome 35 [19]. It is thus more difficult to reject the existence of a linear episome composed of plasmid concatemers and flanked by telomeres than it is to reject most of the putative chromosomal integrations inferred from the sequence data.

All this said, there is a single chromosomal integration of transforming DNA in Clone 1 that clearly cannot be rejected. This is the only case where long reads fully span a putative integration site and is also supported by several times more reads than the others. It is also the only example where molecular support could be obtained (i.e., PCR). It is thus highly likely that the putative transgene integration into A. castellanii chromosome 5 of transformant line ‘Clone 1’ is genuine. However, this instance is also unlike any of the other plasmid sequences found in the four analyzed clonal lines in that only a 365-bp fragment of the plasmid is integrated. This sequence comes from the EGFP gene and does not contain either of the termini required for proper expression, so it seems unlikely to have any biological influence on the cell, including not contributing to resistance against the selecting antibiotic in culture. Extrachromosomal copies of the transforming plasmid thus must be responsible for antibiotic resistance in this clone, as is expected in the others. This integrated DNA would also not have been detected by the probe against the neoR gene, but this clone was not probed experimentally anyway.

A strikingly similar situation was observed in the fungal pathogen Histoplasma capsulatum by Woods and Goldman [6, 37]. Transforming plasmids in H. capsulatum were either chromosomally integrated or modified into a linear plasmid closely resembling the ones observed here in A. castellanii, including addition of telomeric repeats followed by maintenance of the episome at high copy number. Both of these potential destinations for the transgene DNA showed tandem amplification of the transforming plasmid. Transformation with circular or linear plasmid does not appear to influence transgene fate. Although only one chromosomal integration was observed here for A. castellanii, our overall findings are nearly identical to those in H. capsulatum [6, 37].

Similar outcomes have been described in the fungal genetics literature, and in other protists. Examples include the fungi Cryptococcus neoformans [5, 38] and Fusarium oxysporum [7] and the ciliate Paramecium tetraurelia [39]. Of these examples, none exhibit tandem duplication of the transforming plasmid as clearly as A. castellanii or H. capsulatum, with F. oxysporum potentially having partial duplication and rearrangement of the transforming DNA while the transforming plasmid remains unit-sized in the other two examples. All three do appear to add telomeric repeats to the transforming DNA, apparently without any particular motif required to template the addition. All these organisms must either have sufficiently permissive replication systems such that no specific replication origin is needed or the telomeres themselves can serve as a replication origin.

Among the fungal systems discussed here, all three exhibited the ability to maintain transforming DNA through both chromosomal integration and the formation of autonomous linear episomes. In each of the three, experiments demonstrate that transforming DNA can be maintained through both means simultaneously. These findings support the inference that A. castellanii ‘Clone 1’ in our study has generated a linear episome from the transforming DNA while also containing one bona fide chromosomal integration. Whether this integration occurred independently of linear episome formation or whether a linear episome served as a substrate for this integration is unknown.

With respect to the mechanism responsible for generating tandem arrays of the plasmid sequence, evidence indicates that the transforming plasmid was concertedly duplicated in some way. It is unlikely that random ligation of several plasmids gave rise to these arrays, based on the non-random orientation of the plasmid units within a given array. The restriction endonuclease used to linearize the plasmid creates blunt ends (at the cut site TTA^TAA), so no overhangs were created that could have influenced the potential alignment of free-floating plasmid molecules.

Intriguingly, the telomeres do not appear to join the plasmid array at the position where the original transforming plasmid was linearized. The nucleotide position on the plasmid where telomeres were joined was used as a proxy to estimate the diversity of episomes within each transformed clone. In each respective clone, there appears to be one dominant form of the episome, with a frequency of 20 − 50% depending on the clone, followed by one to a few minor forms with frequencies between 5% and 20%, and the remainder of the observed episome-like reads are effectively ‘singletons’ in our data. This would suggest each clone has a few different versions of the episome that are replicated somewhat more efficiently than the rest, with a large diversity of rare versions. It is unclear whether this diversity of forms arose from a single episome due to instability, or whether multiple episomes formed at the time of transformation due to the large quantity of transforming DNA, with some being held at higher copy number than others. It seems plausible that the latter scenario is true and the difference in frequency of the different forms may be caused by differences in replication efficiency due to unknown factors.

At least two hypotheses can explain why there are so many observed positions where telomeres join the plasmid. Perhaps the instability known to be intrinsic to tandemly duplicated sequences caused sufficient rearrangement to create this variation in plasmid-telomere junction sites [40, 41]. Examination of the episome-representing reads provides support for this hypothesis; the arrays are often composed of neatly duplicated plasmid copies but there are cases where there is significant fragmentation of the plasmid units within these episomes, presumably due to recombination. Another possible explanation is that degradation of the ends of arrays, whether passively or by enzyme-mediated processes, caused the ends to vary until telomeres were added to protect the plasmid arrays and preserve the structure that existed at that moment in time. These two possibilities are not mutually exclusive; both could contribute to the observed variation in plasmid-telomere junctions in our A. castellanii transformants. Having observed no common sequence motifs at the plasmid-telomere junctions, the precise mechanism(s) underlying de novo telomere addition are unclear.

If the hypothesis favoured here is true — i.e., that A. castellanii maintains transformed DNA as an autonomous linear plasmid — this would have evolutionary implications as well as implications for molecular biology experimentation in this system. From a high-level evolutionary perspective, our findings suggest a broader distribution of episomal transgene maintenance across eukaryotes than previously thought. With the exception of the alveolate ciliate P. tetraurelia, the formation of telomere-bearing linear episomes has only been observed in fungi. While Amoebozoa is the closest supergroup to Obazoa, which contains fungi, this should still prompt molecular biologists working on protists from other parts of the eukaryote tree to be alert to similar phenomena. The observation of this type of molecular biology in the two major groups of Amorphea (i.e., Amoebozoa plus Obazoa), as well as in an alveolate, thought to be on the opposite side of the eukaryote root [42, 43], could indicate that it is widespread among eukaryotes.

There are several possible implications of our findings when looking at the molecular level. In fungi, autonomously replicating plasmids provide great utility for genetic experiments by virtue of high transformation efficiency and high copy number [7, 38, 44, 45]; transforming DNA that cannot replicate autonomously has to be chromosomally integrated which is comparatively much less efficient. Having discovered autonomously replicating plasmids in A. castellanii, the applications of these entities in fungi could serve as inspiration to develop genetic tools for highly efficient transformation and to study its biology more intensively. The same concept could possibly be applied to other nascent protist model systems where transformation is difficult or not yet achieved.

Episomal transgene maintenance in A. castellanii may play a role in fine-scale genome evolution, specifically lateral gene transfer (LGT). Many transformation protocols serve to facilitate access of the transforming DNA to the nucleus, but do not themselves provide an obvious mechanism by which the DNA can be maintained once it has arrived there. Providing exogenous DNA acquired in a natural environment gains access to the nucleus, it may then be subjected to the same conditions and processes as transforming DNA in the laboratory. The spontaneous generation of autonomously replicating plasmids described in this study could apply to environmentally acquired exogenous DNA, allowing it to persist longer in the nucleus. This would greatly expand the window of opportunity for the requisite steps of LGT, such as chromosomal integration and gene expression, to occur.

One can even imagine some of the steps involved in expressing the newly acquired genes to take place prior to integration, such as the acquisition or invention of promoters and other regulatory elements. Aside from providing a standing source of foreign DNA for integration, autonomous plasmids could be a substrate for innovation. They may begin expressing genes that provide a selective advantage to the recipient cell before chromosomal integration, encouraging maintenance of the new genetic material until it can be incorporated more permanently. This hypothetical model could help tip the scales in what is sometimes thought to be a highly improbable series of events.

Conclusions

We have shown how long-read, single molecule sequencing can serve as a high throughput readout for molecular biological experimentation. While still not trivial, a wide range of experimental possibilities can be explored with genome-wide sequence read sets much more quickly than with traditional molecular biology experiments. However, it also leaves room for ambiguities that must be explored and resolved through experimentation.

Moving from methodology to biology, our results show that A. castellanii appears to duplicate circular and linear plasmid DNA into a tandem array and add telomeres to the ends such that it can be autonomously replicated in the nucleus. This genetic capability has not been observed in A. castellanii before. Its existence would suggest that the organism’s telomerase is capable of acting on substrates without existing telomeric repeats, and that its replication machinery is able to replicate non-chromosomal DNA, potentially by recognizing the newly added telomeres. We cannot rule out the possibility of chromosomally integrating exogenous DNA; indeed, our data suggest that A. castellanii has the genetic flexibility to use autonomously replicating elements and chromosomal integration to retain foreign DNA, possibly simultaneously. These observations can be exploited by molecular biologists to improve the efficiency and flexibility of genetic manipulation in this organism. At the same time, these capabilities hint at the existence of a molecular gateway for lateral gene transfer in nature, as has been suggested to occur in A. castellanii and other Amoebozoa based on the presence of bacterial and viral genes in their genome [19, 46,47,48]. The formation and maintenance of linear episomes from foreign DNA could extend the window for integration to occur.

Our findings also contribute to a broader evolutionary discussion on the ubiquity of such genetic mechanisms and their role in genome evolution across eukaryotes. The distribution of similar processes across major eukaryote groups could reflect a widespread or even universal mechanism for sampling extrachromosomal DNA in the nucleus and potentially retain it more permanently. Such a system could be useful for retaining endogenous DNA that has been expelled from its chromosomal home and would otherwise no longer be replicated. This could also serve as a mechanism for facilitating eukaryote-eukaryote LGT across the eukaryotic tree via the creation, maintenance and transfer of episomes. Beyond virus-mediated transfer, the mechanisms underpinning LGT in eukaryotes is largely a black box [49] and elucidation of previously unrecognized genetic capabilities such as episomal maintenance could go a long way toward explaining how LGT contributes to the evolution, diversification, and ecological adaptation of eukaryotic cells in nature.

Data availability

The sequencing datasets analyzed in this study are all part of the same BioProject at NCBI under the accession number PRJNA487265. The read sets can be found individually under the following accession numbers from SRA on NCBI: mixed transformants: SRX4625410, Clone 1: SRX27968418, SRX7813525, Clone 5: SRX27968416, Clone 8: SRX27968417, Clone LT6: SRX27968413, SRX27968419, Clone LT8: SRX27968414, Clone LT9: SRX27968415, SRX27968420. Plasmids can be made available by the authors upon request. Additional files generated over the course of the analysis are deposited at https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.15019835. Bioinformatics scripts used to perform analyses are available at https://github.com/morgancolp/Acastellanii_transgene_analysis.

References

  1. Gaikani HK, Stolar M, Kriti D, Nislow C, Giaever G. From beer to breadboards: yeast as a force for biological innovation. Genome Biol. 2024;25:10.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Fus-Kujawa A, Prus P, Bajdak-Rusinek K, Teper P, Gawron K, Kowalczuk A, et al. An overview of methods and tools for transfection of eukaryotic cells in vitro. Front Bioeng Biotechnol. 2021;9:701031.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Faktorová D, Nisbet RER, Fernández Robledo JA, Casacuberta E, Sudek L, Allen AE, et al. Genetic tool development in marine protists: emerging model organisms for experimental cell biology. Nat Methods. 2020;17:481–94.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Mileshina D, Koulintchenko M, Konstantinov Y, Dietrich A. Transfection of plant mitochondria and in organello gene integration. Nucleic Acids Res. 2011;39:e115–115.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Varma A, Edman JC, Kwon-Chung KJ. Molecular and genetic analysis of URA5 transformants of Cryptococcus neoformans. Infect Immun. 1992;60:1101–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Woods JP, Goldman WE. In vivo generation of linear plasmids with addition of telomeric sequences by Histoplasma capsulatum. Mol Microbiol. 1992;6:3603–10.

    Article  CAS  PubMed  Google Scholar 

  7. Powell WA, Kistler HC. In vivo rearrangement of foreign DNA by fusarium oxysporum produces linear self-replicating plasmids. J Bacteriol. 1990;172:3163–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hadi MZ, McMullen MD, Finer JJ. Transformation of 12 different plasmids into soybean via particle bombardment. Plant Cell Rep. 1996;15:500–5.

    Article  CAS  PubMed  Google Scholar 

  9. Takano M, Egawa H, Ikeda J-E, Wakasa K. The structures of integration sites in Transgenic rice. Plant J. 1997;11:353–61.

    Article  CAS  PubMed  Google Scholar 

  10. Neff RJ, Purification. Axenic cultivation, and description of a soil amoeba, acanthamoeba Sp. J Protozool. 1957;4:176–82.

    Article  Google Scholar 

  11. PAGE FC. Re-Definition of the genus acanthamoeba with descriptions of three species. J Protozool. 1967;14:709–24.

    Article  PubMed  Google Scholar 

  12. Leger MM, Gawryluk RMR, Gray MW, Roger AJ. Evidence for a Hydrogenosomal-Type anaerobic ATP generation pathway in acanthamoeba castellanii. PLoS ONE. 2013;8:e69532.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Hall J, Voelz H. Bacterial endosymbionts of acanthamoeba Sp. J Parasitol. 1985;71:89.

    Article  CAS  PubMed  Google Scholar 

  14. HORN M. Bacterial endosymbionts of Free-living amoebae 1. J Eukaryot Microbiol. 2004;51:509–14.

    Article  PubMed  Google Scholar 

  15. Schmitz-Esser S, Toenshoff ER, Haider S, Heinz E, Hoenninger VM, Wagner M, et al. Diversity of bacterial endosymbionts of environmental acanthamoeba isolates. Appl Environ Microbiol. 2008;74:5822–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Iovieno A, Ledee DR, Miller D, Alfonso EC. Detection of bacterial endosymbionts in clinical acanthamoeba isolates. Ophthalmology. 2010;117:445–e4523.

    Article  PubMed  Google Scholar 

  17. Lorenzo-Morales J, Khan NA, Walochnik J. An update on acanthamoeba keratitis: diagnosis, pathogenesis and treatment. Parasite. 2015;22:10.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Clarke M, Lohan AJ, Liu B, Lagkouvardos I, Roy S, Zafar N, et al. Genome of acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 2013;14:R11.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Matthey-Doret C, Colp MJ, Escoll P, Thierry A, Moreau P, Curtis B, et al. Chromosome-scale assemblies of acanthamoeba castellanii genomes provide insights into Legionella Pneumophila infection-related chromatin reorganization. Genome Res. 2022;32:1698–710.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Byers TJ. Molecular Biology of DNA inAcanthamoeba, Amoeba, Entamoeba, and Naegleria. 1986. pp. 311–41.

  21. Matsunaga S, Endo T, Yagita K, Hirukawa Y, Tomino S, Matsugo S, et al. Chromosome size polymorphisms in the genus acanthamoeba electrokaryotype by Pulsed-Field gel electrophoresis. Protist. 1998;149:323–40.

    Article  CAS  PubMed  Google Scholar 

  22. Hofstatter PG, Brown MW, Lahr DJG. Comparative genomics supports sex and meiosis in diverse amoebozoa. Genome Biol Evol. 2018;10:3118–28.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Peng Z, Omaruddin R, Bateman E. Stable transfection of acanthamoeba castellanii. Biochimica et biophysica acta (BBA) -. Mol Cell Res. 2005;1743:93–100.

    CAS  Google Scholar 

  24. Bateman E. Expression plasmids and production of EGFP in stably transfected acanthamoeba. Protein Expr Purif. 2010;70:95–100.

    Article  CAS  PubMed  Google Scholar 

  25. Colp M. Transformation of Acanthamoeba castellanii with Qiagen SuperFect reagent. protocols.io. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.17504/protocols.io.s4regv6

  26. Colp M, Blais C, Archibald JM. Single cell isolation and monoclonal culture establishment of Acanthamoeba castellanii using migration on agar plates. protocols.io. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.17504/protocols.io.bc3eiyje

  27. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–92.

    Article  CAS  PubMed  Google Scholar 

  30. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30:2114–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Seibel NM, Eljouni J, Nalaskowski MM, Hampe W. Nuclear localization of enhanced green fluorescent protein homomultimers. Anal Biochem. 2007;368:95–9.

    Article  CAS  PubMed  Google Scholar 

  33. Luther DC, Jeon T, Goswami R, Nagaraj H, Kim D, Lee Y-W, et al. Protein delivery: if your GFP (or other small Protein) is in the cytosol, it will also be in the nucleus. Bioconjug Chem. 2021;32:891–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Tan K-T, Slevin MK, Meyerson M, Li H. Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres. Genome Biol. 2022;23:180.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Gomes NMV, Ryder OA, Houck ML, Charter SJ, Walker W, Forsyth NR, et al. Comparative biology of mammalian telomeres: hypotheses on ancestral States and the roles of telomeres in longevity determination. Aging Cell. 2011;10:761–8.

    Article  CAS  PubMed  Google Scholar 

  36. McEachern MJ, Blackburn EH. A conserved sequence motif within the exceptionally diverse telomeric sequences of budding yeasts. Proceedings of the National Academy of Sciences. 1994;91:3453–7.

  37. Woods JP, Goldman WE. Autonomous replication of foreign DNA in Histoplasma capsulatum: role of native telomeric sequences. J Bacteriol. 1993;175:636–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Edman JC. Isolation of telomerelike sequences from Cryptococcus neoformans and their use in High-Efficiency transformation. Mol Cell Biol. 1992;12:2777–83.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Gilley D, Preer JR, Aufderheide KJ, Polisky B. Autonomous replication and addition of telomerelike sequences to DNA microinjected into paramecium tetraurelia macronuclei. Mol Cell Biol. 1988;8:4765–72.

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Rambosek JA, Kinsey JA. An unstable mutant gene of the am locus of neurospora results from a small duplication. Gene. 1984;27:101–7.

    Article  CAS  PubMed  Google Scholar 

  41. Petes TD, Hill CW. RECOMBINATION BETWEEN REPEATED GENES IN MICROORGANISMS. Annu Rev Genet. 1988;22:147–68.

    Article  CAS  PubMed  Google Scholar 

  42. Derelle R, Torruella G, Klimeš V, Brinkmann H, Kim E, Vlček Č et al. Bacterial proteins pinpoint a single eukaryotic root. Proceedings of the National Academy of Sciences. 2015;112.

  43. Williamson K, Eme L, Baños H, McCarthy CGP, Susko E, Kamikawa R, et al. A robustly rooted tree of eukaryotes reveals their excavate ancestry. Nature. 2025. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41586-025-08709-5.

    Article  PubMed  Google Scholar 

  44. Tsukuda T, Carleton S, Fotheringham S, Holloman WK. Isolation and characterization of an autonomously replicating sequence from ustilago Maydis. Mol Cell Biol. 1988;8:3703–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Struhl K, Stinchcomb DT, Scherer S, Davis RW. High-frequency transformation of yeast: autonomous replication of hybrid DNA molecules. Proc Natl Acad Sci. 1979;76:1035–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Grant JR, Katz LA. Phylogenomic study indicates widespread lateral gene transfer in entamoeba and suggests a past intimate relationship with Parabasalids. Genome Biol Evol. 2014;6:2350–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Nývltová E, Stairs CW, Hrdý I, Rídl J, Mach J, Pačes J, et al. Lateral gene transfer and gene duplication played a key role in the evolution of Mastigamoeba Balamuthi hydrogenosomes. Mol Biol Evol. 2015;32:1039–55.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Žárský V, Klimeš V, Pačes J, Vlček Č, Hradilová M, Beneš V, et al. The Mastigamoeba Balamuthi genome and the nature of the Free-Living ancestor of entamoeba. Mol Biol Evol. 2021;38:2240–59.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Sibbald SJ, Eme L, Archibald JM, Roger AJ. Lateral gene transfer mechanisms and Pan-genomes in eukaryotes. Trends Parasitol. 2020;36:927–41.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank Marlena Dlutek for assistance with experiments and logistics. Plasmids used in the study were received as a gift from Michelle M. Leger and Andrew J. Roger.

Funding

This research was supported by a grant from the Gordon and Betty Moore Foundation (GBMF5782). M.J.C. was supported by graduate student scholarships from the Natural Sciences and Engineering Research Council of Canada and Dalhousie University.

Author information

Authors and Affiliations

Authors

Contributions

M.J.C. and J.M.A. designed the study. M.J.C. extracted genomic DNA and performed library preparation and nanopore sequencing. M.J.C. and C.B. collected experimental data. M.J.C. and B.A.C. conducted bioinformatic analysis. M.J.C. drafted the manuscript with revision and editing by J.M.A. All authors reviewed and provided feedback on the final draft.

Corresponding authors

Correspondence to Morgan J. Colp or John M. Archibald.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Colp, M.J., Blais, C., Curtis, B.A. et al. The fate of artificial transgenes in Acanthamoeba castellanii. BMC Genomics 26, 368 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11552-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11552-7

Keywords