- Research
- Open access
- Published:
Species-specific RNA barcoding technology for rapid and accurate identification of four types of influenza virus
BMC Genomics volume 26, Article number: 409 (2025)
Abstract
Background
The influenza virus (IV) is responsible for seasonal flu epidemics. Constant mutation of the virus results in new strains and widespread reinfections across the globe, bringing great challenges to disease prevention and control. Research has demonstrated that barcoding technology efficiently and cost-effectively differentiates closely related species on a large scale. We screened and validated species-specific RNA barcode segments based on the genetic relationships of four types of IVs, facilitating their precise identification in high-throughput sequencing viral samples.
Results
Through the analysis of single nucleotide polymorphism, population genetic characteristics, and phylogenetic relationships in the training set, 7 IVA type, 29 IVB type, 40 IVC type, and 5 IVD type barcode segments were selected. In the testing set, the nucleotide-level recall rate for all barcode segments reached 96.86%, the average nucleotide-level specificity was approximately 55.27%, the precision rate was 100%, and the false omission rate was 0%, demonstrating high accuracy, specificity, and generalization capabilities for species identification. Ultimately, all four types of IVs were visualized in a combination of one-dimensional and two-dimensional codes and stored in an online database named Influenza Virus Barcode Database (FluBarDB, http://virusbarcodedatabase.top/database/index.html).
Conclusion
This study validates the effective application of RNA barcoding technology in the detection of IVs and establishes criteria and procedures for selecting species-specific molecular markers. These advancements enhance the understanding of the genetic and epidemiological characteristics of IVs and enable rapid responses to viral genetic mutations.
Background
The IVs represent a highly variable pathogen, classified within the class Insthoviricetes, order Articulavirales, and family Orthomyxoviridae [1]. They are categorized into four types: IAV-IDV, among which IAV has garnered significant attention in the public health sector due to its wide host range and evolving genetic characteristics [2,3,4]. All four types of IVs are single-stranded negative-sense RNA viruses with segmented genomes, which infect hosts through the binding of hemagglutinin (HA) proteins to sialic acid receptors on the host cell membrane [5]. The IVs are widely distributed across the world, causing seasonal epidemics or sporadic outbreaks annually on a global scale. According to the WHO report, there are approximately 1 billion cases of influenza globally each year, with 3–5 million being severe cases [6]. Additionally, 290,000 to 650,000 deaths are attributed to influenza annually, posing a serious threat to public health [7,8,9].
The continuous reassortment and evolution of IV gene segments have led to the frequent emergence of new viral subtypes, posing a great challenge to detection efforts [10]. Conventional techniques such as reverse transcription polymerase chain reaction (RT-PCR) and enzyme-linked immunosorbent assays (ELISA) are commonly used for viral identification [11]. While these methods offer high sensitivity and specificity, they often require sophisticated equipment, extensive labor, and considerable financial investment, making their deployment difficult in resource-limited settings [12]. For example, RT-PCR, widely regarded as the gold standard for viral detection, demands specialized thermal cyclers, precise reagents, and trained personnel to ensure accurate results [13]. A single RT-PCR assay incurs higher per-test costs due to equipment, reagents, and labor, expenses which can accumulate significantly when testing large populations. ELISA is another common method for detecting viral antigens. Although its per-test cost is lower compared to RT-PCR, it requires specific antibodies and enzymatic reagents, which makes large-scale testing both time-consuming and resource-intensive [14]. These traditional approaches also require cold chain logistics for reagent storage and careful handling, further increasing the operational complexity and costs, particularly in low- and middle-income countries where laboratory infrastructure may be limited [15].
Barcoding technology, since its introduction by Paul Hebert in 2003, has played a pivotal role in biodiversity research [14]. Based on next-generation sequencing (NGS) technology, this technique screens unique and stable molecular genetic markers within DNA or RNA sequences (e.g. the mitochondrial cytochrome C oxidase subunit I gene) to achieve efficient and accurate species identification [16]. Comparative studies have shown that, while the initial investment in sequencing platforms required for RNA barcoding technology can be substantial, the per-sample cost decreases significantly as the number of samples increases [16]. For instance, NGS costs are significantly lower than those of RT-PCR when processing hundreds or thousands of samples simultaneously [12]. Furthermore, a study comparing the labor costs for ELISA and NGS-based viral detection revealed that ELISA requires nearly double the technician time due to manual processing steps, whereas NGS can be automated, drastically reducing the need for specialized personnel [11]. Additionally, RNA barcoding technology does not require the cold chain infrastructure that ELISA and RT-PCR often depend on, making it a more practical solution in remote areas with limited access to sophisticated laboratory environments [15]. Therefore, barcoding not only simplifies the workflow but also offers a more accessible and scalable method for virus detection in diverse geographic and economic settings. In recent years, the application scope of barcoding technology has continually expanded, and numerous research findings have confirmed its potential in virus identification [17,18,19]. Lam et al. successfully identified SARS-CoV- 2-related coronaviruses in pangolins by extracting species-specific markers such as virus isolate (GX/P2 V), reads, and contigs [17]. Langat and colleagues utilized metabarcoding to profile RNA viruses in Ceratopogonidae species, identifying the insect host species associated with these viruses [18]. You et al. have made substantial progress in the identification of SARS-CoV- 2 from HCoVs and SARSr-CoV- 2 lineages using RNA barcoding technology [19].
Building upon the foundation of previous research on the RNA barcoding technology of SARS-CoV- 2, we have constructed a customized framework for IVs [19]. Our research aimed to screen species-specific barcode segments for four types of IVs, and assess segments’ accuracy, reliability, and generalization ability in identifying IVs from unidentified biospecimens. Based on the genetic similarities and differences among IVs, we constructed a training set (TRS) including four types of IVs (main taxa). Through genetic polymorphism analysis, the establishment of a genetic distance (GEDI) matrix, and phylogenetic analysis, we uncovered the genetic relationships among the four types of IVs. Subsequently, we utilized single nucleotide polymorphism (SNP) sites within IV genomes and BLAST to obtain and screen high-quality barcode segments. Rigorously tested across multiple testing sets (TESs), these barcode segments have demonstrated exceptional performance in viral identification. The IV barcode segments and their associated information were visualized as one-dimensional (1D) and two-dimensional (2D) codes and subsequently uploaded to the Influenza Virus Barcode Database (FluBarDB, http://virusbarcodedatabase.top/database/index.html) online database for public access and inquiry. Researchers can easily obtain the barcode segments’ information by scanning 2D code with mobile electronic devices or by visiting the online database. The screening and identification of barcode segments deepen researchers’ understanding of the genetic characteristics of IVs, providing new tools and strategies for virus identification and public health monitoring. The flow chart is described in Fig. 1.
Methods
Dataset construction
All complete genome sequences of IAV-IDV (the four basic taxa) were collected from the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/) database to construct TRS (Supplementary Table S1, sheet 1–4 for TRS details including accession and version numbers) [20]. The assembly of segmented sequences for IAV and IBV followed the order (Table 1) (PB2, PB1, PA, HA, NP, NA, M, NS); whereas for ICV and IDV, the order was (PB2, PB1, P3, HEF, NP, M, NS). A total of 69 strains were obtained, comprising 527 segmented sequences (the basic information was presented in Table 1). These strains were collected from diverse global regions (e.g., Brisbane, California, Moscow and Hong Kong), encompassing a wide spectrum of the human population, underscoring the strength and global relevance of our study. To enhance alignment accuracy and ensure software compatibility, all degenerate bases (denoted in nucleotide sequences as RYMKSWHBVDN) in TRS were removed using Python scripts (Python scripts provided in Supplementary File S1) [21, 22].
To analyze the genetic diversity of four types of IVs and preliminarily test the identification capability of barcode segments, we constructed a dataset named TES-1. TES-1 contained all species within Insthoviricetes class, encompassing 2 families, 6 genera, and 18 species, with a total of 123 segmented sequences (detailed annotation information was available in Supplementary Table S2, sheet 1). The data acquisition was guided by the NCBI-taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy/?term) [23] and the taxonomic classification of species referred to the International Committee on Taxonomy of Viruses (ICTV, https://ictv.global/taxonomy) database [24].
To assess the accuracy and reliability of barcode segments identification, we have constructed two additional independent TESs (TES-2 and TES-3). TES-2 encompassed a collection of segmented sequences retrieved from the NCBI as of August 29, 2024, including 401,002, 114,849, 738 and 364 sequences for IAV-IDV, respectively from NCBI. Comprehensive information regarding the time span of collection and search details was meticulously documented in Supplementary Table S2, sheet 2, explicitly excluding sequences included in the TRS. TES-3 was composed of four distinct TESs, each formed by merging TES-1 with one of the four taxa in TRS separately, designed to assess the specificity of barcode segments for various types of IVs (Supplementary Table S2, sheet 3).
Bioinformatics analysis of TRS
TRS sequences were subjected to global alignment using MAFFT (https://mafft.cbrc.jp/alignment/server/) [25]. Subsequently, SNP characteristics and GEDI matrices of TRS and TES-1 were analyzed using Molecular Evolutionary Genetics Analysis (MEGA) v11 [26, 27]. The DnaSP [28] [with minimum window length “ ≥ 20”, conservation threshold (THR) = 100] and Python codes (Supplementary File S1) were used to describe conserved regions within coding sequences (CDSs) in TRS and TES-1. Analyses of gene flow and genetic differentiation assisted in understanding the population genetic differences, evolutionary processes, and speciation mechanisms among IVs and Insthoviricetes species [28].
The TRS and TES-1 datasets were combined to construct a phylogenetic tree. The reliability of the constructed phylogenetic tree was evaluated in DAMBE [29]. We selected the optimal substitution model and constructed phylogenetic trees based on neighbor-joining (NJ) [30], maximum likelihood (ML) [31], and unweighted pair-group method with arithmetic means (UPGMA) [32] algorithms in MEGA [26]. The phylogenetic tree was visualized using iTOL (https://itol.embl.de/) [33], with tree confidence levels set to greater than 90%.
Barcode segments screening and visualization
SNP sites were identified for all taxa within the TRS in DnaSP. Sequences were then partitioned at these SNP locations to yield the initial segments, with any segments containing gaps being discarded [28, 34]. Subsequently, these initial segments were aligned using the BLAST tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi) [23, 35]. The BLAST parameters, including “Program selection parameters” (target percent identity set at 95% for highly similar sequences) and “Max target sequences” (maximum displayed sequences set at 5000), were set to the highest THRs to maximize BLAST results reliability. When all alignment results belonged to the same IV type with 100% “Percent Identity”, the segment was considered accurate and reliable for identifying that IV type [19]. The distribution of barcode segments for different types of IVs within CDSs was visualized using Proksee (https://proksee.ca/).
In certain situations, although segments exhibited a uniform “Percent Identity” value of 100% in BLAST, the presence of sequences annotated as “partial genome/CDS” caused the “Coverage Rate” to fall below 100%. A literature review indicated that the total BLAST score in alignment results was linearly correlated with the matching length, yet it is influenced by factors such as matched bases, mismatches, and the number of insertions and deletions [19]. The “Conserved regions” function of DnaSP evaluated the conservation of barcode segments in the TRS by comparing conserved regions against randomly generated sequences, calculating a corresponding P value [28]. By integrating BLAST alignment results and DnaSP algorithms, we introduced a novel metric termed barcode segment weight (BSW) score to evaluate barcode segment identification efficiency, defined as BSW = lg(max total BLAST score/P value).
Subsequently, all barcode segments were visualized as a combination of 1D and 2D codes. The 1D codes, generated via Python (Supplementary File S1), represented AT(U) and GC base pairs as short and long comb teeth, respectively. The corresponding 2D codes were created through an online platform (https://cli.im/) with a dynamic live code format and a 30% error correction rate, allowing easy access by scanning with mobile devices. Besides, we employed the Primer3Plus (https://www.primer3plus.com/) [36] and the NCBI Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi) [37] to design the primers of barcode segments and verify their reliability.
Two workflows were established to facilitate practical barcode technology application in laboratories (Supplementary Fig. S1). The first workflow involved experimental validation: Viral RNA extraction from samples was performed using standard methods (e.g., QIAamp Viral RNA Mini Kit), followed by reverse transcription into cDNA (e.g., SuperScript IV Reverse Transcriptase, Invitrogen) [34].
PCR amplification was conducted using barcode-specific primers, and IV genome presence was confirmed by agarose gel electrophoresis based on expected band sizes. Alternatively, sequences from unknown viral samples could be directly aligned with barcode segments in FluBarDB (http://virusbarcodedatabase.top/database/blast.html) using online BLAST tools to identify IV types.
The accuracy and reliability testing
To quantitatively and comprehensively assess the identification capability of barcode segments, we introduced two metrics: recall rate and specificity [19]. TES-2 evaluated the barcode segments’ recall rate (Fig. 2A), with the algorithms for the recall rate as follows: average nucleotide-level recall rate = the cumulative frequency of identical nucleotide sites at corresponding positions between all species sequences and barcode segments after alignment/barcode segment length/the number of species; average species-level recall rate = sum of nucleotide-level recall rates for each species/number of species. The average nucleotide-level recall rate reflected the ability of barcode segments to accurately identify target species at the nucleotide level, while the average species-level recall rate evaluated the discriminatory capabilities of barcode segments at the species level. In TES-2, the nucleotide-level recall rate for each species was set to 100% if it exceeded a predetermined THR of X%, otherwise it was set to 0%.
Meanwhile, TES-3 was employed to assess the specificity of barcode segments of IVs (Fig. 2B), utilizing the following calculations: average nucleotide-level specificity = the cumulative frequency of different nucleotide sites at corresponding positions between all species sequences and barcode segments after alignment/barcode segment length/the number of species; average species-level specificity = sum of nucleotide-level specificity for each species/number of species. In TES-3, a species’ nucleotide-level specificity was set at 100% if it differed by X bases from the barcode segments, and 0% if fewer than X bases differed. Additionally, gaps (“-”) in TES-3 were treated as distinct bases entirely different from the standard nucleotides A, G, C, and T.
Assessment of generalization ability
Sequence similarities within TRSs might lead to overfitting when differentiating among the four IV types. Hence, we needed additional tests to assess the identification capacity of barcode segments against viruses of unknown homology or phylogenetically distant strains from IVs [19, 38]. We employed BLAST functionalities from GISAID’s EpiCoV (SARS-CoV- 2 database, TES-4-1, https://gisaid.org/) [39], BV-BRC (bacterial and viral database, TES-4-2, https://www.bv-brc.org/app/Homology) [40], NGDC (Coronaviridae family, Poxviridae family, Monkeypox virus genomes database, and NCBI RefSeq representation genomes, TES-4-3, https://ngdc.cncb.ac.cn/blast/blastn) [41], and RVDB (nucleotide database, TES-4-4, https://rvdb.dbi.udel.edu/blast) [42]. The BLAST parameters for these databases were uniformly set to “Optimize for highly similar sequences”.
Introduction of FluBarDB
FluBarDB is a barcode segment database for IVs developed using a web-based architecture (Table 2) [43]. It is accessible at http://virusbarcodedatabase.top/database/index.html and offers comprehensive information on IV barcode segments along with analytical tools (Fig. 3). Users can download segments and their corresponding 1D/2D codes, preprocess sequences, and extract genetic information. The BLAST online tool enables sequence alignment with the IV barcode database using a customizable “Percent Identity” parameter, while the visualization tool displays sequence data in adjustable 1D/2D codes. Additionally, the batch analysis function calculates the recall rate and specificity of barcode segments. FluBarDB also provides genome annotations, lineage information, and real-time news and literature updates, ensuring users have timely access to the latest research findings and public health information. Detailed documentation is available at http://virusbarcodedatabase.top/database/data/help.docx.
Results
Single nucleotide polymorphism analysis
Analysis of SNP sites in TRS and TES-1 datasets revealed significant patterns of genetic diversity among Insthoviricetes species. Except for ICV, which had an average AT base pair content (BPC) of up to 62.7%, the average AT BPC for the remaining IVs ranged from 56 to 60% (Fig. 4; Supplementary Table S3, sheet 1). This indicated that ICV has experienced stronger selective pressure, leading to adaptive mutations. Furthermore, the proportion of bases at different codon positions was similar among the different IVs (with a lower frequency of G base in ICV) suggesting a relatively conserved codon usage pattern [21]. However, this phenomenon also posed challenges for viral classification and molecular identification.
The base substitution patterns in the four types of IVs were predominantly SI, with IBV and IDV being particularly notable (R value: 5.7, 7.4) (Fig. 5, Supplementary Table S3, sheet 2). The quantity of SI + SV within the TRS, Orthomyxoviridae, and Articulavirales taxa (4714; 4881; 4872) were higher than those observed in IVs (IAV-IDV: 2203, 597, 435, 290). Therefore, the nucleotide substitution saturation in the four types of IVs was low, with minimal evolutionary noise. Moreover, DAMBE results (Table 3) demonstrated that the Iss values for the TRS, the Orthomyxoviridae taxon and Articulavirales taxon were significantly lower than the Iss.cSym values, indicating that these groups exhibited low substitution saturation, which allowed for the construction of phylogenetic trees within the Insthoviricetes class. Additionally, the proportion of identical pairs in the four types of IVs was high (> 80%, Supplementary Table S3, sheet 2), providing ample sequence space for the extraction of barcode segments.
Nucleotide pair frequencies of TRS and TES-1. The common logarithmic treatment is applied since the SI and SV values of viral strains differ significantly. The brown dashed diagonal line (x = y) divides the coordinate system into upper and lower regions. The R value [the ratio of ln(SI) and ln(SV)] anchor point is above the line (R value > 1), suggesting that the species’ base substitution form is biased toward SI, otherwise the form is biased toward SV (R value < 1). The degree of bias increases as the vertical distance between the anchor point and the diagonal increases. SI, transitional pairs; SV, transversional pairs
Population genetic characteristics
Within TRS, the smallest intraspecies GEDI (taxon3 and 4: 0.420) was 2.76 times that of the largest intraspecies GEDI (taxon1: 0.152) (Fig. 6, Supplementary Table S3, sheet 3), indicating fewer genetic differences within each IV type compared to differences between IV types. For TES1, the genetic differences among various strains (GEDI: 1.059) were more pronounced, resulting in clearer identification outcomes for the barcode segments. In gene flow tests, the haplotype diversity (Hd) for all strains exceeded 0.98 (except for ICV: 0.9) (Fig. 6, Supplementary Table S3, sheet 4). The total nucleotide diversity (π) value for TES-1 was 0.5277, but the π values for all IVs were all below 0.1322. This revealed that species within the Insthoviricetes class had a high level of genetic differentiation, with severe divergence in variation direction. ICV and IDV, in particular, exhibited no significant gene flow (π: 0.0266; 0.0215), indicating lower evolutionary rates and highly differentiated genomic regions with numerous variation sites. The distribution characteristics of conserved regions (Fig. 7, Supplementary Table S3, sheet 5) indicated that IAV and IDV had fewer conserved regions (7, 5) with high average GC BPC (46.67%, 46.47%), while IBV and ICV possessed numerous conserved regions (29, 42) with low average GC BPC (37.40%, 37.05%) and significant length variations. Therefore, we speculated that IAV and IDV exhibited a higher conservation, while mutations in IBV and ICV were constrained by multiple factors.
Phylogenetic results indicated that the UPGMA tree offered the best species classification outcome compared with the NJ and ML trees (Fig. 8, Supplementary Fig. S2 and S3) [32]. This may be attributed to UPGMA’s assumption of a constant evolutionary rate, which better reflects the uniform mutation patterns observed among these IVs. All types of IVs occupied the outermost branches of the tree, indicating their relatively recent emergence compared to other Insthoviricetes species, with distinct interspecies differentiation. IAV exhibited the highest intraspecific differentiation (branch length: 0.8–1.3) and was located in the middle of the phylogenetic tree, indicating the highest level of genetic material exchange, evolutionary adaptability, and pathogenic potential. IAV and IBV showed a closer relationship compared to ICV and IDV, suggesting that IAV and IBV had a more recent common ancestor evolutionarily [1]. Branches associated with IDV displayed longer branch lengths, suggesting that IDV has undergone more variation events during its evolution.
Considering the gene flow results, we observed a gradual decrease in π values from IAV to IDV (0.1322, 0.0407, 0.0266, 0.0215), indicating a progressively lower degree of potential gene exchange between IVs (Supplementary Table S3, sheet 4). Therefore, we hypothesize that in the absence of significant recombination events, IVs are unlikely to form new epidemic branches in the near future Any evolutionary events leading to the emergence of novel IV types would likely require a prolonged period [6].
Visualization of barcode segments
We respectively identified 5494, 1911, 985, and 1218 SNP sites for IAV-IDV within the TRS (Supplementary Table S3, sheet 6) through DnaSP. Low-quality segments screened based on SNP sites were manually removed. After BLAST (P value < 0.05), we ultimately selected 7 IAV (2PB2, 1PB1, 1 NP, 3 M), 29 IBV (3 PB2, 4 PB1, 4 PA, 2 HA, 6 NP, 2 NA, 4 M, 4 NS), 40 ICV (11 PB2, 8 PB1, 6 P3, 3 HEF, 8 NP, 4 NS) and 5 IDV (1 PB2, 3 PB1, 1 NS) (Figs. 9 and 10, detailed BLAST results in Supplementary Table S4) barcode segments, with the number of barcode segments roughly equivalent to the number of conserved regions. The mean BSW scores for IAV and ICV were the highest (5.67; 5.08), whereas those for IBV and IDV were relatively lower (4.50, 4.89). Therefore, barcode segments of IAV and ICV exhibited greater identification capability in complex environments compared to those of IBV and IDV. Additionally, BSW scores of barcode segments for IAV and ICV approximated a normal distribution, whereas IBV and IDV exhibited a similar and negative skewness, indicating that their BSW score distributions were negatively skewed toward lower values (Fig. 10). The identification stability of IAV and ICV barcode segments was notably superior, particularly in the context of metagenomics or high-throughput data analysis.
The box plots and scatter plots of BSW values for barcode segments in CDSs. The curve on the right side of the scatter plot represents the fitted normal distribution curve of the BSW values for segments. The central line inside the box represents the median BSW value of the barcode segments within CDSs
Species-specific barcode segments were designed as composite 1D and 2D codes (Fig. 11). Scanning test results showed that the visual 2D code was convenient to use and BLAST results displayed clearly and intuitively. Primer testing results showed that all predicted amplification products belonged to the same IV type targeted by the barcode segments. Detailed information of primers is available in Supplementary Table S5.
Barcode segments testing
Results of TESs indicated that all barcode segments generally possessed high accuracy, specificity, and generalization capabilities for species identification. In TES-2, the average nucleotide-level recall rate for all barcode segments reached 96.86% (Fig. 12A), with the highest for IBV even surpassing 99.39% (TES-2-2 in Supplementary Table S2, sheet 2). For species-level tests, aside from two barcode segments with a recall rate of 0 at THR > 0.90, 25.93% of barcode segments still maintained a recall rate of 100% even at the extreme condition of THR > 0.99. Therefore, barcode segments with nucleotide-level similarity exceeding 90% (THR > 0.90) to a test sample (e.g., novel or unrecorded IV variants not included in TRS) were considered accurate for species identification. This capability ensured reliable identification with recall rates close to 100%, even for sequences containing partial gaps due to poor sequencing quality (note that long gaps might interfere with the high recall rate of barcode segments, Supplementary Table S2, sheet 2).
In TES-3, the average nucleotide-level specificity for all barcode segments was 55.27% (Fig. 12B), and the lowest specificity of IDV still exceeded 51.90% (TES-3-4, Supplementary Table S2, sheet 3). This finding indicated that the differences in nucleotide composition or the order of base sequences between different types of IVs exceeded 50.00%. At the species level, even when THR was set to “ ≥ 4 bp”, all barcode segments accurately identified the corresponding species. With the setting of “ ≥ 10 bp” in THR, 87.65% of barcode segments retained over 92.54% specificity in identification. Therefore, results from TES- 3 confirmed nucleotide differences between IVs and their related species, and demonstrated that the barcode segments had the ability to differentiate species closely related to IVs by appropriately adjusting THRs.
To avoid overfitting, TES-4 focused on exploring the generalization capability of barcode segments to identify IV sequences from external databases (Supplementary Table S2, sheet 4–7). Results from TES-4-1 (GISAID) and TES-4-3 (NGDC) showed that all IV barcode segments had a 0% false omission rate (FOR) in identifying SARS-CoV- 2, Coronaviridae species, Poxviridae species, and Monkeypox virus (i.e., “No results”). Moreover, results from TES-4-2 (BV-BRC), TES-4-3 (NGDC), and TES-4-4 (RVDB) demonstrated that despite a few segments not achieving 100% coverage, the precision rate for all IV barcode segments was 100%.
Discussion
Over the past two decades, barcoding technology has been validated for species classification and diversity assessment across the realms of animals [44,45,46], plants [34], and microorganisms [19, 47]. Particularly in studies of Theaceae [34], Orchidaceae [48], and SARS-CoV- 2 [19], the synergistic approach of molecular biology and bioinformatics has successfully screened species-specific barcode segments. This research screened 81 barcode segments from the complete genomes of four types of IVs. Results from GEDI, phylogenetic analysis and TESs demonstrated that all type-specific IV barcode segments could identify their corresponding virus types in complex environments without being affected by intra-species mutations. The research innovatively utilized the combined 1D and 2D codes as a medium for disseminating barcode segments. The dynamic 2D code representation facilitated real-time updates of information and aggregated user behavior patterns (such as clickstream data and usage frequency) on the backend, thereby markedly enhancing the interactivity of barcoding technology. All data tracking was performed using anonymized data in strict compliance with ethical standards and data protection regulations, ensuring that no personally identifiable information was collected. In addition, the optimized visualization design of barcode segments enhanced reading efficiency, comprehensibility, recognition accuracy, and portability, empowering new users to swiftly grasp and apply this technology. The FluBarDB platform was used to store and analyze barcode segments, filling a critical gap in IV barcode resources, enhancing the efficiency of bioinformatics analysis, and promoting the application of barcoding technology in virology research and public health monitoring.
On the basis of previous research [19, 34], we proposed three innovative requirements for constructing TRS as follows: 1. Simplified construction: TRS could be constructed using reference and published sequences, encompassing globally prevalent and highly virulent IV subtypes like H1 N1, H2 N2, H3 N2, H5 N1, and H7 N9 (with detailed accession and version numbers provided in Supplementary Table S1, sheet 1–4) to enhance the species diversity of TRS [49,50,51]. Since the TRS was designed to screen for highly conserved barcode segments within species, there was no need to remove homologous sequences during dataset construction. 2. Batch processing capability: TRS featured a modular internal structure, allowing for scalable batch modifications to adapt to changing data processing and analysis requirements. 3. Broad applicability: To overcome compatibility issues encountered by some programs, three strategies were developed to increase the applicability of TRS: 1) Adjusting the output file format to match NCBI sequence standards (e.g., base U to base T); 2) Removing all degenerate bases from TRS to lessen genetic noise and improve alignment accuracy; 3) We proposed differentiating genetic analyses into interspecies and intraspecies levels, particularly for viruses with numerous subtypes or lineages [19].
The analysis of SNPs and population genetic characteristics revealed significant genetic diversity and evolutionary information about IVs. The average GC BPC of four types of IVs was found to resemble that of other species within the Insthoviricetes class, indicating genetic associations among them [19]. An assessment of base substitutions exposed the potential variability of the viruses, suggesting the potential applicability of barcoding technology for identifying other species with similar genetic characteristics [52]. A large number of SNP sites existed within IVs, akin to those in the Betacoronavirus genus and HCoVs strains, aiding in the selection of barcode segments with significant discriminatory capability [19]. The detection of conserved regions confirmed the effectiveness of barcode segments in identifying new variants of IVs [53]. Gene flow played a crucial role in maintaining genetic diversity, usually occurring within the same ecosystem or among geographically adjacent populations [54]. The findings showed that the transmission of IAV was largely unrestricted by time and space, having evolved into a population with a complex internal phylogeny, while IBV, ICV, and IDV faced higher evolutionary selective pressures. Although no statistically significant correlation was observed between the genetic diversity of IVs and the number of conserved regions identified in gene flow and GEDI analyses, these findings still offered valuable insights into the genetic dynamics and evolutionary strategies of these viruses.
Strains within the Insthoviricetes class were renowned for extensive antigenic variability and broad infection capabilities [1], and the phylogenetic analysis facilitated a deeper understanding of their critical characteristics (e.g. the evolution, spread, and drug resistance of IVs) [55]. The phylogenetic results suggested minimal impact on species classification caused by excessively long branches resulting from a few low-quality sequences. SNP sites had a species classification function similar to internal nodes in the tree, thus, selecting barcode segments based on SNP sites provided accurate identification results for both closely related IVs and more distant viruses. The high genetic variability of IVs was driven by the high mutation rate and gene reassortment phenomena (i.e., gene flow) within their RNA genomes. In contrast, the neutral evolution model focused on genetic variations caused by random processes, such as genetic drift, rather than natural selection [6]. In studies of rapidly evolving pathogens like IVs, selective pressure was often found to be more significant than neutral processes [19]. Thus, this study emphasized the role of gene flow tests in IV genetic diversity but overlooked the impact of neutral evolution on polymorphism and divergence, as well as the evolutionary trends associated with time and geography [19, 56, 57].
This study enhanced the robustness of barcode segment identification by developing multiple algorithms. In view of genetic variances among the four types of IVs, we categorized these viruses as distinct taxa within TRS to reduce conserved sequence loss from alignment algorithms. Our lab created BSW values using NCBI BLAST and DnaSP results, applying a weighted logarithmic approach to assess the precision and stability of barcode segments [28, 35]. The consistency of distribution between barcode segments and conserved CDS regions of IVs validated the utility of barcode segments as molecular markers [58, 59]. The RNA polymerase complex (including PB2 and PB1 proteins) was crucial for replication and transcription in IVs [60, 61]. The NP protein encapsulated and safeguarded viral RNA, aiding replication [62]. M proteins, with M1 acting as the core structural component and M2 functioning as an ion channel, contribute to viral disassembly [63]. NS1 and NS2 proteins formed the NS complex and modulated host immune response and viral complex export [61]. HA and NA proteins were surface glycoproteins unique to IAV and IBV, essential for viral entry and release, and interacted strongly with host cells [64]. The surface glycoprotein unique to ICV and IDV was HEF protein, which functioned similarly to HA and NA proteins of IAV and IBV, responsible for viral attachment, entry, and fusion within infected cells [65]. Barcode segments screened from conserved regions in the above CDSs could track viral immune evasion and related phenomena [64, 65]. The primary variant sites in types A/B and C/D IVs clustered around the HA/NA and HEF, aiding in monitoring and understanding IV epidemiology for informed preventive and vaccine strategies. Barcode segments of IAV and IDV, situated outside high-variation CDS regions, remained less prone to mutation-induced invalidation.
Building on the SARS-CoV- 2 barcode segments testing algorithm proposed by You et al., this study refined the algorithms for recall rate and specificity of identification of barcode segments at both the species and nucleotide levels [19]. Additionally, the precision rate and FOR were introduced as metrics to assess the generalizability of barcode segments. Compared to the previously reported SARS-CoV- 2 barcode segments (average recall rate: 94.00%, specificity: 25.00%), the IV barcode segments identified in this study demonstrated improved performance, with average nucleotide-level recall rates of 100.00% and specificity of 55.27% [19]. The barcoding technique utilized in this study excelled in both data processing efficiency and identification accuracy, making it highly suitable for resource-limited settings and emergency public health responses. TESs included a wide range of viral strains from comprehensive databases, covering not only members of the Insthoviricetes class but also pathogens of significant international concern (e.g., Monkeypox virus and SARS-CoV- 2), a scope rarely addressed in previous studies [19, 66]. The data collection period spanned from the 1990 s to 2024, covering major epidemic events (SARS: 2002–2003; H1 N1: 2009–2010; Zika virus outbreak: 2015–2016; COVID- 19: late 2019 to present) [19, 67,68,69]. This further confirmed that potential recombinant events had minimal impact on the stability and reliability of barcode segments.
Conclusion
The study utilized comprehensive genome sequencing, genetic diversity analysis, and a BSW scoring system to screen out 81 barcode segments from four types of IVs and assessed the accuracy and reliability of these segments. Results successfully demonstrated the effectiveness of RNA barcoding technology in identifying and distinguishing between four types of IVs. Visualization of barcode segments using 1D and 2D codes optimized reading efficiency, clarity, and user interaction experience. The establishment of the online barcode database (FluBarDB) significantly facilitated global collaboration and knowledge sharing, marking a significant advancement in molecular identification within virology and genetics. This work enhanced our understanding of the genetics and evolution of IVs and set new benchmarks for molecular identification technology. However, a limitation of this study is the incomplete exploration of host factors influencing barcode segment performance. Future research will address this issue by investigating host-virus interactions and their impact on barcode efficacy and exploring additional bioinformatics tools to achieve wider applications in pathogen monitoring and biodiversity studies.
Data availability
All sequencing data applied in this study is publicly accessible on the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/). The accession and version numbers of sequences in TRS (complete genome sequences of IAV-IDV) on NCBI can be found in Supplementary Table S1, sheet 1–4. The search details of sequences in TES-2 (all genome sequences of IAV-IDV) on NCBI can be found in Supplementary Table S2, sheet 2.
The accession and version numbers of sequences in TES-1 (species within Insthoviricetes class) on NCBI-taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy/?term) can be found in Supplementary Table S2, sheet 1.
The taxonomic classification of the species within Insthoviricetes class referred to the International Committee on Taxonomy of Viruses (ICTV, https://ictv.global/taxonomy) database.
Detailed information of barcode segments can be viewed on our own Influenza Virus Barcode Database (FluBarDB, http://virusbarcodedatabase.top/database/dataset.html).
All source codes can be found in Supplementary File S1, and these codes have been deposited at https://github.com/Dyy0426/FluBarcoding.git.
References
Uyeki TM, Hui DS, Zambon M, Wentworth DE, Monto AS. Influenza. Lancet. 2022;400(10353):693–706.
Bouvier NM, Palese P. The biology of influenza viruses. Vaccine. 2008;26(Suppl 4):D49-53.
Yoo SJ, Kwon T, Lyoo YS. Challenges of influenza A viruses in humans and animals and current animal vaccines as an effective control measure. Clin Exp Vaccine Res. 2018;7(1):1–15.
Yu X, Wang C, Chen T, Zhang W, Yu H, Shu Y, et al. Excess pneumonia and influenza mortality attributable to seasonal influenza in subtropical Shanghai, China. BMC Infect Dis. 2017;17(1):756.
Fodor E. The RNA polymerase of influenza A virus: mechanisms of viral transcription and replication. Acta Virol. 2013;57(2):113–22.
World Health Organization. Global influenza strategy 2019–2030. https://www.who.int/publications/i/item/9789241515320. (15 March 2019, date last accessed).
McGinnis J, Laplante J, Shudt M, George KS. Next generation sequencing for whole genome analysis and surveillance of influenza A viruses. J Clin Virol. 2016;79:44–50.
Trimarco JD, Heaton NS. From high-throughput to therapeutic: host-directed interventions against influenza viruses. Curr Opin Virol. 2022;53:101198.
Zhu Z, Fodor E, Keown JR. A structural understanding of influenza virus genome replication. Trends Microbiol. 2023;31(3):308–19.
McDonald SM, Nelson MI, Turner PE, Patton JT. Reassortment in segmented RNA viruses: mechanisms and outcomes. Nat Rev Microbiol. 2016;14(7):448–60.
Carter LJ, Garner LV, Smoot JW, Li Y, Zhou Q, Saveson CJ, et al. Assay techniques and test development for COVID-19 diagnosis. ACS Cent Sci. 2020;6(5):591–605.
Venter M, Richter K. Towards effective diagnostic assays for COVID-19: a review. J Clin Pathol. 2020;73(7):370–77.
Notomi T, Mori Y, Tomita N, Kanda H. Loop-mediated isothermal amplification (LAMP): principle, features, and future prospects. J Microbiol. 2015;53(1):1–5.
Dziąbowska K, Czaczyk E, Nidzworski D. Detection methods of human and animal influenza virus-current trends. Biosensors (Basel). 2018;8(4):94.
Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, DeRisi JL. Virus identification in unknown tropical febrile illness cases using deep sequencing. PLoS Negl Trop Dis. 2012;6(2):e1485.
Miller S, Chiu C. The Role of Metagenomics and next-generation sequencing in infectious disease diagnosis. Clin Chem. 2021;68(1):115–24.
Lam TT, Jia N, Zhang YW, Shum MH, Jiang JF, Zhu HC, et al. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature. 2020;583(7815):282–5.
Langat SK, Eyase F, Bulimo W, Lutomiah J, Oyola SO, Imbuga M, et al. Profiling of RNA viruses in biting midges (Ceratopogonidae) and related Diptera from Kenya using metagenomics and metabarcoding analysis. mSphere. 2021;6(5):e0055121.
You C, Jiang S, Ding Y, Ye S, Zou X, Zhang H, et al. RNA barcode segments for SARS-CoV-2 identification from HCoVs and SARSr-CoV-2 lineages. Virol Sin. 2024;39(1):156–68.
Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, et al. The influenza virus resource at the national center for biotechnology information. J Virol. 2008;82(2):596–601.
Grantham R, Gautier C, Gouy M, Mercier R, Pavé A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980;8(1):r49-62.
Linhart C, Shamir R. The degenerate primer design problem: theory and applications. J Comput Biol. 2005;12(4):431–56.
Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford). 2020;2020:baaa062.
Kuhn JH, Abe J, Adkins S, Alkhovsky SV, Avšič-Županc T, Ayllón MA, et al. Annual (2023) taxonomic update of RNA-directed RNA polymerase-encoding negative-sense RNA viruses (realm Riboviria: kingdom Orthornavirae: phylum Negarnaviricota). J Gen Virol. 2023;104(8):001864.
Rozewicki J, Li S, Amada KM, Standley DM, Katoh K. MAFFT-DASH: integrated protein sequence and structural alignment. Nucleic Acids Res. 2019;47(W1):W5-10.
Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol. 2021;38(7):3022–7.
Liu L, Yu H, Wang D. Genomic and biological characteristics of an alphabaculovirus isolated from Trabala vishnou gigantina. Virus Res. 2022;308:198630.
Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302.
Xia X. DAMBE7: new and improved tools for data analysis in molecular biology and evolution. Mol Biol Evol. 2018;35(6):1550–2.
Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–25.
King KM, Van Doorslaer K. Building (viral) phylogenetic trees using a maximum likelihood approach. Curr Protoc Microbiol. 2018;51(1):e63.
Jiao X, Flouri T, Rannala B, Yang Z. The impact of cross-species gene flow on species tree estimation. Syst Biol. 2020;69(5):830–47.
Letunic I, Bork P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49(W1):W293–6.
Jiang S, Chen F, Qin P, Xie H, Peng G, Li Y, et al. The specific DNA barcodes based on chloroplast genes for species identification of Theaceae plants. Physiol Mol Biol Plants. 2022;28(4):837–48.
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36:W5-9.
Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3–new capabilities and interfaces. Nucleic Acids Res. 2012;40(15):e115.
Sayers EW, Beck J, Bolton EE, Bourexis D, Brister JR, Canese K, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2021;49(D1):D10–7.
Shariat SF, Lotan Y, Vickers A, Karakiewicz PI, Schmitz-Dräger BJ, Goebell PJ, et al. Statistical consideration for clinical biomarker research in bladder cancer. Urol Oncol. 2010;28(4):389–400.
Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 2017;22(13):30494.
Olson RD, Assaf R, Brettin T, Conrad N, Cucinell C, Davis JJ, et al. Introducing the bacterial and viral bioinformatics resource center (BV-BRC): a resource combining PATRIC IRD and ViPR. Nucleic Acids Res. 2023;51(D1):D678–89.
CNCB-NGDC members and partners. Database resources of the national genomics data center, China national center for bioinformation in 2023. Nucleic Acids Res. 2023;51(D1):D18-28.
Zhao P, Zhou S, Xu P, Su H, Han Y, Dong J, et al. RVdb: a comprehensive resource and analysis platform for rhinovirus research. Nucleic Acids Res. 2024;52(D1):D770–6.
Agosto-Arroyo E, Coshatt GM, Winokur TS, Harada S, Park SL. Alchemy: a web 2.0 real-time quality assurance platform for human immunodeficiency virus, hepatitis C virus, and BK virus quantitation assays. J Pathol Inform. 2017;8:18.
Li H, Xiao W, Tong T, Li Y, Zhang M, Lin X, et al. The specific DNA barcodes based on chloroplast genes for species identification of Orchidaceae plants. Sci Rep. 2021;11(1):1424.
Beebe NW. DNA barcoding mosquitoes: advice for potential prospectors. Parasitology. 2018;145(5):622–33.
Kress WJ, García-Robledo C, Uriarte M, Erickson DL. DNA barcodes for ecology, evolution, and conservation. Trends Ecol Evol. 2015;30(1):25–35.
Vu D, Groenewald M, de Vries M, Gehrmann T, Stielow B, Eberhardt U, et al. Large-scale generation and analysis of filamentous fungal DNA barcodes boosts coverage for kingdom fungi and reveals thresholds for fungal species and higher taxon delimitation. Stud Mycol. 2019;92:135–54.
Kim HM, Oh SH, Bhandari GS, Kim CS, Park CW. DNA barcoding of Orchidaceae in Korea. Mol Ecol Resour. 2014;14(3):499–507.
Ampomah PB, Lim LHK. Influenza A virus-induced apoptosis and virus propagation. Apoptosis. 2020;25(1–2):1–11.
Jiao P, Song Y, Huang J, Xiang C, Cui J, Wu S, et al. H7N9 avian influenza virus is efficiently transmissible and induces an antibody response in chickens. Front Immunol. 2018;9:789.
Sun H, Liu J, Xiao Y, Duan Y, Yang J, Chen Y, et al. Pathogenicity of novel reassortant Eurasian avian-like H1N1 influenza virus in pigs. Virology. 2021;561:28–35.
Gogoi B, Wann SB, Saikia SP. DNA barcodes for delineating Clerodendrum species of North East India. Sci Rep. 2020;10(1):13490.
Taubenberger JK, Kash JC. Influenza virus evolution, host adaptation, and pandemic formation. Cell Host Microbe. 2010;7(6):440–51.
Bahl J, Vijaykrishna D, Holmes EC, Smith GJ, Guan Y. Gene flow and competitive exclusion of avian influenza A virus in natural reservoir hosts. Virology. 2009;390(2):289–97.
Hayati M, Sobkowiak B, Stockdale JE, Colijn C. Phylogenetic identification of influenza virus candidates for seasonal vaccines. Sci Adv. 2023;9(44):eabp9185.
Jiang S, Zhao G, Ding Y, Ye S, Li Z, You C, et al. Deciphering dengue: novel RNA barcoding segments for enhanced serotype-specific identification and global surveillance of dengue viruses. Front Microbiol. 2024;15:1474406.
Welch JJ, Eyre-Walker A, Waxman D. Divergence and polymorphism under the nearly neutral theory of molecular evolution. J Mol Evol. 2008;67(4):418–26.
Blois S, Goetz BM, Bull JJ, Sullivan CS. Interpreting and de-noising genetically engineered barcodes in a DNA virus. PLoS Comput Biol. 2022;18(11): e1010131.
Wu NC, Wilson IA. Influenza hemagglutinin structures and antibody recognition. Cold Spring Harb Perspect Med. 2020;10(8):a038778.
Böttcher-Friebertshäuser E, Garten W, Matrosovich M, Klenk HD. The hemagglutinin: a determinant of pathogenicity. Curr Top Microbiol Immunol. 2014;385:3–34.
Chauhan RP, Gordon ML. An overview of influenza A virus genes, protein functions, and replication cycle highlighting important updates. Virus Genes. 2022;58(4):255–69.
Ren C, Chen T, Zhang S, Gao Q, Zou J, Li P, et al. PLK3 facilitates replication of swine influenza virus by phosphorylating viral NP protein. Emerg Microbes Infect. 2023;12(2):2275606.
Calderon BM, Danzy S, Delima GK, Jacobs NT, Ganti K, Hockman MR, et al. Dysregulation of M segment gene expression contributes to influenza A virus host restriction. PLoS Pathog. 2019;15(8):e1007892.
Byrd-Leotis L, Cummings RD, Steinhauer DA. The interplay between the host receptor and influenza virus hemagglutinin and neuraminidase. Int J Mol Sci. 2017;18(7):1541.
Wang M, Veit M. Hemagglutinin-esterase-fusion (HEF) protein of influenza C virus. Protein Cell. 2016;7(1):28–45.
Altindis M, Puca E, Shapo L. Diagnosis of monkeypox virus - an overview. Travel Med Infect Dis. 2022;50:102459.
Gallaher WR. Towards a sane and rational approach to management of influenza H1N1 2009. Virol J. 2009;6:51.
Nicholls J, Dong XP, Jiang G, Peiris M. SARS: clinical virology and pathogenesis. Respirology. 2003;8 Supp l:S6-8.
Plourde AR, Bloch EM. A literature review of Zika Virus. Emerg Infect Dis. 2016;22(7):1185–92.
Acknowledgements
We thank our teacher and the members of our research group for their help and support. Their expertise and insights have been instrumental in shaping our work.
Funding
This research was supported by grants from Key Research & Development Project of Nanhua Biomedical Co., Ltd (No. H202191490139), National Natural Science Foundation of China (No. 32372124), China Postdoctoral Science Foundation (Nos. 2021M701160 and 2022M721101) and the Undergraduate Innovation and Entrepreneurship Training Program (XCX2024138).
Author information
Authors and Affiliations
Contributions
S.J., collected and analyzed the data, constructed database, and drafted the manuscript; Y.D. analyzed the data, constructed database, and drafted the manuscript; G.Z., S.Y., and S.L. curated data and performed the formal analysis; Y.Y., Z.L., and X.Z. improved the manuscript; C.Y. administrated and supervised the project and provided partial funding support; X.G. designed and conceived the experiment, acquired funding, and revised the manuscript. All authors contributed to the final version of the manuscript as well as all authors read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jiang, S., Ding, Y., Zhao, G. et al. Species-specific RNA barcoding technology for rapid and accurate identification of four types of influenza virus. BMC Genomics 26, 409 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11602-0
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11602-0