- Research
- Open access
- Published:
Genomic evidence of the blood virome and bacteriome provides insights into prevalence, evolution, and susceptibility-related genes across Eurasian pigs
BMC Genomics volume 26, Article number: 413 (2025)
Abstract
Background
Infectious diseases are among the primary constraints to pig production, and the globalization of the pig industry has contributed to the emergence and spread of pathogens. However, there is a lack of comprehensive genomic surveillance on the Eurasian scale, resulting in the prevalence and evolution of pig pathogenic viruses and bacteria are still unknown.
Results
In this study, we proposed a protocol to identify viral and bacterial sequences and estimate the abundance accurately based on the whole-genome sequencing data of the blood samples. Through whole-genome analysis of 685 Eurasian pigs, we constructed the blood virome and bacteriome landscape. There were a total of 15 pathogenic bacteria, 12 pathogenic viruses, and porcine endogenous retrovirus were identified. We divided 685 Eurasian pigs into three subgroups and discovered significant differences in the viral and bacterial composition, prevalence, and abundance among subgroups. Besides, we performed the quantitative Polymerase Chain Reaction experiment to quantify the copy number of porcine endogenous retrovirus and confirm the reliability of the proposed protocol. Furthermore, we constructed the phylogenetic tree of porcine parvovirus 6 and the results suggested that large-scale transportation across China provides viral connectivity between geographically distinct localities, potentially facilitating the spread of viruses. We also discovered the ADAM28 and ADAMDEC1 genes that may relate to porcine lymphotropic herpesvirus, and the ATF4 gene that may correlate with porcine cytomegalovirus.
Conclusions
Our study provides new insights into the genomic investigation and epidemiology of viruses and bacteria, in turn helping to prevent viral and bacterial infectious diseases in pigs.
Background
The pigs can sustainably supply staple protein to humans and play a crucial role in global food security, agricultural economies, and human biomedical model research [1, 2]. In the past few years, the transportation of pigs across different geographical regions has led to the global spread of infectious diseases, and the economic and environmental value of pigs has been severely challenged. The constant threat of endemic and emerging diseases affecting swine, which in some instances also impact human health, highlights the potential vulnerability of pork production around the world [3].
The geographical and genetic diversity of pig population virome and bacteriome could be exploited for public health screening and prevention. Blood samples of pigs are suitable research subjects. A thorough study showed that many viruses are present in peripheral blood and members of the Herpesviridae and Anelloviridae families were identified in human without obvious disease conditions [4]. Similarly, some viruses or bacteria, such as porcine cytomegalovirus [5] and Pasteurella multocida [6], have been reported to potentially cause inapparent infections without obvious clinical symptoms in pigs. Therefore, even in seemingly healthy pig community, pathogenic viruses and bacteria may be present in the blood environment, especially in the context of ubiquitous disease resistance of Chinese indigenous pigs. However, when these pathogens reach high abundance and synergistically interact with other abiotic and biotic stressors, they may cause morphological, physiological, or behavioral changes, increase mortality, and ultimately result in the collapse of pigs [7, 8]. Hence, investigating the composition of the virome and bacteriome of pigs is essential for infectious disease surveillance and control [9]. However, there is a lack of comprehensive genomic surveillance on the Eurasian scale, resulting in the prevalence and evolution of pig pathogenic viruses and bacteria are still unknown.
High-throughput and publicly accessible next-generation sequencing data from blood samples of pigs provide the potential for investigating the blood virome and bacteriome globally. After mapping sequencing reads to the pig reference genome, there is a significant proportion that is left uncharacterized. The sequences of viruses and bacteria are by-products of the sequencing of the pig genome [10]. The typical polymerase chain reaction (PCR) and enzyme-linked immunosorbent assay (ELISA) methods exclusively focus on a few specific pathogens, while blood-based WGS data facilitate to identify and quantify all pathogenic DNA viruses and bacteria simultaneously.
Here, we proposed a sequence identification protocol to diagnose infections and estimate abundance. By screening the natural populations based on deep whole-genome sequencing (WGS) data from blood samples of 685 Eurasian pigs, our study establishes the blood virome and bacteriome landscape and provides critical information for viruses and bacteria genomic investigations, epidemiology, and prevention of viral and bacterial infections.
Results
Viral and pathogenic bacterial sequences in the pig blood samples
Whole-genome resequencing data of 685 domestic Eurasian pigs from 28 pig breeds were collected in this study. Chinese indigenous pigs were sampled from eleven administrative divisions in China, including ten provinces and an autonomous region (Fig. 1a). The latitude, longitude, and weather information for the sampling locations of Chinese indigenous pigs were collected (See Additional file 1, Table S1). After quality control, sequencing data of the above 685 pigs were aligned against the Sscrofa11.1 reference genome, and the average sequencing coverage depth is 20.8 × (See Additional file 1, Table S2). For the unmapped reads, we identified the pathogenic viral and bacterial sequences of pig origin and estimated the abundance through a rigorous protocol (Fig. 1b). A total of 15 pathogenic bacteria and 13 viruses, including non-pathogenic porcine endogenous retrovirus (PERV) and 12 pathogenic viruses, were identified among the above 685 individuals (See Additional file 1, Table S3).
Study design. (a) Geographic distribution of collected Chinese indigenous pig samples. Red dots represent pigs in southern and southwestern China (SSC), while green dots represent pigs in central and eastern China (CEC). (b) Pipeline of viral and bacterial reads identification based on the WGS data in our study. (c) Neighbor-joining phylogenetic tree constructed based on WGS data among all the pig breeds
Population genetic structure analysis and subgroup division
After single nucleotide polymorphisms (SNP) calling and subsequent stringent quality control based on the mapped reads, a total number of 46,404,497 SNPs were used to infer the genetic structure of the population in this study. To fully explore the relationship among the Eurasian population in this study, we constructed the neighbor-joining (NJ) phylogenetic tree (Fig. 1c) and the results showed that individuals of the same breed generally were clustered together. Simultaneously, we found a clear divergence between European (EUR) breeds and Chinese indigenous breeds. Furthermore, the genetic clustering results showed that Chinese domestic pigs were divided into two subgroups, including breeds mainly distributed in central and eastern China (CEC), and southern and southwestern China (SSC).
Viral and bacterial composition and prevalence of Eurasian pigs
We discovered that 80.4% (551/685) of the individuals have the coexistence of pathogenic viruses and bacteria (Fig. 2a-b). Moreover, 46.3% (317/685) of all individuals were found to be infected with multiple pathogenic viruses, with two to five different viruses present, while a total of 81.5% (558/685) of the individuals were found to be infected with multiple pathogenic bacteria, with two to ten different bacteria present. Our study revealed that pathogenic virus-bacteria coinfection and virus co-circulation are common occurrences in pig colonies.
Identification of viruses and bacteria across Eurasian pigs. (a) Numbers of pathogenic viruses identified in each individual, colored by virus species. (b) Number of pathogenic bacteria identified in each individual, colored by bacteria species. BMX: Bama Xiang pig; DNSE: Diannan Small-ear pig; DS: Dongshan pig; EHL: Erhualian pig; HJB: Hanjiang Black pig; JQH: Jiangquhai pig; JXB: Jiaxing Black pig; JH: Jinhua pig; KL: Kele pig; LT: Lantang pig; LBW: Large Black-white pig; LP: Leping pig; LGSS: Liangguang Small-spotted pig; MS: Meishan pig; MI: Mi pig; PT: Putian pig; SZL: Shaziling pig; SXS: Shengxian Spotted pig; TC: Tongcheng pig; XIANG: Xiang pig; BS: Berkshire; DU: Duroc; HS: Hampshire; IB: Iberian; LR: Landrace; ML: Mangalica; PI: Pietrain; YS: Yorkshire
For the pathogenic viruses and bacteria with a prevalence greater than 10%, we further analyzed their prevalence across different subgroups (Fig. 3a). The results showed that Chinese indigenous pigs were primarily affected by porcine lymphotropic herpesvirus (PLHV) and porcine parvovirus (PPV), while European pigs were mainly influenced by porcine cytomegalovirus (PCMV) and porcine torque teno virus (PTTV). Interestingly, the species of Mycoplasma infections differ between Asian and European pig breeds. All Mycoplasma detected in Chinese indigenous pigs were Mycoplasma suis (M. suis), while all detected in European pigs were Mycoplasma parvum strain Indiana (M. parvum).
Prevalence and abundance of viruses and bacteria across Eurasian pigs. (a) Prevalence of the pathogenic viruses and bacteria across three subgroups. (b-e) Kruskal-Wallis rank sum test for abundance of viruses and bacteria among three subgroups. (b) PLHV1. (c) PLHV3. (d) S. aureus. (e) S. epidermidis
Furthermore, for viruses or bacteria infecting more than 40 individuals in each subgroup, we performed the Kruskal-Wallis rank sum test to analyze the abundance of infected individuals across three subgroups (Fig. 3b-e). The results revealed significant differences in the abundance of PLHV1 and S. aureus, indicating that different subgroups display varying susceptibility to viruses and bacteria.
Additionally, in Chinese indigenous pigs, the number of viral and bacterial species identified in the SSC group was significantly higher than that in the CEC group (p = 3.6e-09, See Additional file 2, Figure S1). Notably, the SSC region exhibited higher annual average temperature (19.73℃ VS 18.48℃) and annual average vapor pressure (18.30 hpa VS 17.78 hpa) compared to the CEC region (See Additional file 1, Table S1). These environmental conditions, characterized by elevated temperature and humidity, likely facilitate the proliferation and transmission of viruses and bacteria. Our results highlight the critical need for enhanced measures to prevent and control infectious diseases in regions with high temperature and high humidity.
Characterization and copy number quantification of PERV
PERV is integrated into the genome of all pigs and is stably inherited through the germ line which can productively infect a wide spectrum of human and other mammalian cells [11]. Recently, PERV has received widespread attention due to its potential risk of cross-species transmission in xenotransplantation [12,13,14]. In this study, we detected PERV reads in all 685 pigs. To further confirm the reliability of the estimated PERV abundance, we conducted the quantitative Polymerase Chain Reaction (qPCR) to assess the presence and expression status of PERV in eight breeds of Chinese domestic pigs. We tested a total of 20 randomly selected individuals, with two to three individuals per breed. The qPCR results demonstrated that PERV presents in all samples (Fig. 4a).
Identification and copy number quantification of PERV. (a) PCR analysis of eight Chinese domestic pig breeds. Lane M: DL 2000 DNA Marker. Lane 1–2: Large Black-white pig. Lane 3–5: Lantang pig. Lane 6–8: Diannan Small-ear pig. Lane 9–10: Putian pig. Lane 11–12: Luchuan pig. Lane 13–14: Leping pig. Lane 15–17: Jinhua pig. Lane 18–20: Meishan pig. (b) Copy number quantification of PERV of eight pig breeds. (c) Abundance of PERV of eight pig breeds
The copy number of PERV was calculated through qPCR. To establish a standard method, the TFRC gene, a single copy housekeeping gene, was used for calibration. Quantification of the total copy number of the initial DNA was performed using the standard curve [11] for the PERV pol gene (GenBank ID: Y17013.1) and TFRC gene (gene ID: 397062) (See Additional file 1, Table S4). The qPCR results (Fig. 4b) were consistent with the estimated abundances based on the aligned reads (Fig. 4c), demonstrating the accuracy of the detection and abundance estimation techniques proposed herein.
Phylogenetic inference of PPV6
Given the high abundance of reads aligning to porcine parvovirus 6 (PPV6) in the SSC pigs, we calculated the genome coverage of PPV6 for each individual. Seven pig breeds exhibited genome coverage exceeding 99%, all of which were located in the SSC region (See Additional file 2, Figure S2A). Moreover, we assembled the PPV6 genomes from these breeds and constructed the phylogenetic tree according to the maximum likelihood method. The phylogenetic tree (See Additional file 2, Figure S2B) revealed that PPV6 sequences from geographically adjacent regions were similar, suggesting potential virus transmission between neighboring provinces.
Susceptibility-related genes of viruses
A genome-wide association study (GWAS) was used to identify susceptibility-related genes associated with three viruses exhibiting the highest infection rates in this study: PLHV1, PLHV3, and PCMV. GWAS analysis revealed a total of 430 SNPs associated with the susceptibility of these three viruses (See Additional file 1, Table S5 and Fig. 5).
Three candidate genes, ADAM28, ADAMDEC1, and ATF4, were detected in these regions. Of these, ADAM28 and ADAMDEC1 were identified in the GWAS of PLHV1, while ATF4 was identified in the GWAS of PCMV. Furthermore, we found that there was strong linkage disequilibrium between the significant SNPs identified by GWAS and SNPs in the candidate genes (See Additional file 2, Figure S3). Previous research indicated that ADAM28 and ADAMDEC1, members of the ADAM family of membrane-bound or secreted proteases, are related to human lymphotropic gamma-herpesvirus [15, 16]. Similarly, ATF4 was considered as a candidate gene affecting human cytomegalovirus and murine cytomegalovirus [17, 18]. These findings provide valuable insights into the genetic basis of susceptibility in pigs.
Discussion
Infectious diseases result in direct losses to livestock production through mortality, loss of productivity, trade restrictions, reduced market value, and often food insecurity [3]. Given the importance of infectious diseases to animal health and their impact on the stability and productivity of the global pig industry, a comprehensive investigation of the blood virome and bacteriome of pigs worldwide is of paramount importance.
While conventional approaches such as PCR or ELISA are typically employed to identify specific pathogens of interest, the pipeline of viral and bacterial reads identification proposed in this study allows for more comprehensive detection of viruses and bacteria, coupled with abundance estimation. Besides, most of the microbial analyses use 16 S rRNA gene-based amplification, the viral content has been rarely captured in large-scale microbiome studies [10]. Hence, our study calls attention on the opportunity of interpreting the WGS data for the identification of pathogenic viruses and bacteria. In this study, we present a large-scale genomic survey and provided a blood virome and bacteriome landscape based on the sequencing data across Eurasian pigs. In particular, our study also provided evidence that the pathogenic virus-bacteria coinfection and virus co-circulation in pig colonies is ubiquitous, highlighting the serious challenges posed by viral and bacterial infectious diseases. Given the impact of zoonotic pathogens on public health, these zoonotic pathogens should be of particular concern.
We systematically dissected the genetic relationships of Chinese domestic pigs among themselves and between European pigs. By integrating genomic data with geographic distribution patterens, we divided 685 Eurasian pigs into three subgroups. Importantly, these subgroups exhibited significant divergence in viral and bacterial composition, prevalence, and abundance, indicating that host breeds might influence the viral and bacterial community structure. These findings will aid in the prevention and control of viral and bacterial infectious diseases in pigs of China, even worldwide.
Analyzing viral variation and evolution is crucial for inferring transmission pathways and implementing effective virus prevention strategies. In this study, we constructed the phylogenetic tree for PPV6 and the results indicated the spread events and evolutionary history of PPV6 between neighboring provinces in southern China, which may be due to cross-province transportation [19]. Our study suggested that the large-scale transportation across China provides viral connectivity between geographically distinct localities, potentially facilitating the spread of viruses.
We used GWAS to identify candidate loci that contribute to the susceptibility-related traits of domestic pigs. Annotation of these loci revealed a correlation between the ATF4 gene and PCMV. Previous studies demonstrated that human cytomegalovirus protein pUL38 induces ATF4 Expression, inhibits persistent c-Jun N-terminal kinase phosphorylation, and suppresses endoplasmic reticulum stress-induced cell death [17, 18, 20].
Besides, the ADAM28 and ADAMDEC1 genes were identified as significant candidate genes associated with PLHV in this study. Previous studies indicated that the ADAM family was related to viral entry and cell fusion [21, 22]. Moreover, these two genes have been reported to correlate with human lymphotropic gamma-herpesvirus, specifically Epstein-Barr Virus (EBV). Previous research has indicated that the Epstein-Barr nuclear antigen 3 (EBNA3) family of proteins, comprising EBNA3A, EBNA3B, and EBNA3C, plays pivotal roles in EBV-induced transcriptional regulation, transformation, and apoptosis resistance [23]. Furthermore, EBNA3A and EBNA3C are known to repress ADAM28 and ADAMDEC1 (members of a disintegrin and metalloprotease family) by binding to an intergenic site [23]. A study using EBNA3C conditional lymphoblastoid cell lines reported the repression of the ADAM28-ADAMDEC1 locus was dependent on the ability of EBNA3C to bind to RBP-Jκ [24]. In summary, these researches have demonstrated that ADAM28, ADAMDEC1, and ATF4 are disease susceptibility-related genes in mice and humans, which is consistent with our finding in pigs and suggests that homologous genes in closely related species have convergent functions.
However, the main limitation of this study is that the proposed method for estimating viral and bacterial abundance requires high sequencing depth. Insufficient sequencing depth may fail to detect viral and bacterial sequences. It should be noted that only DNA viruses and bacteria were detected in this study, while RNA viruses could not be identified. In further studies, the investigation of RNA viruses will be conducted based on large-scale transcriptomic data. This approach would facilitate a better understanding of the prevalence and evolution of viruses in Eurasian pigs.
Conclusions
In this study, we screened the natural populations comprised of 685 Eurasian pigs and established the blood virome and bacteriome landscape. We proposed a protocol to identify viral and bacterial sequences and estimate the abundance accurately based on the WGS data of the blood samples. A total of 12 pathogenic viruses, 15 pathogenic bacteria, and porcine endogenous retrovirus were identified. Our results demonstrated the significant differences in the viral and bacterial composition, prevalence, and abundance between subgroups, indicating that host breeds might influence the viral and bacterial community structure. Finally, we discovered the ADAM28 and ADAMDEC1 genes that may relate to PLHV and the ATF4 gene that may correlate with PCMV. On the whole, our study provided critical information for viruses and bacteria genomic investigation and epidemiology, in turn helping to prevent viral and bacterial infectious diseases in pigs.
Methods
Study data resources
The deep WGS data of 685 domestic Eurasian pigs from 28 diverse breeds were downloaded in this study. Of these, 428 Chinese indigenous pigs were from our previous study [25, 26], while 257 European pigs were downloaded from the public database. All sequencing data were sampled from blood. We collected the latitude and longitude information of the sampling locations for Chinese indigenous pigs and the climate data was obtained from the Climatic Research Unit gridded Time Series v4.07 [27] (See Additional file 1, Table S1).
Sequencing reads mapping
For each sample, the paired-end reads were filtered by TrimGalore (v0.6.1) [28] to remove adapter sequences and low-quality reads. Clean sequencing reads were subsequently mapped to the pig reference genome Sscrofa11.1 using Bowtie2 (v2.4.1) [29]. Read pairs with both of the reads not mapped to Sscrofa11.1 were extracted using SAMtools (v1.9) [30].
Identification of viral and bacterial sequences
To identify viral and bacterial sequences more accurately, we downloaded the database composed of reference genomes of bacterial and viral (date of download: December 30th, 2023) using Kraken2 (v2.1.2) [31]. First, high-quality unmapped reads were aligned to our database using Kraken2, and then viral and bacterial reads were identified and extracted according to the taxonomy ID. To further ensure the reliability of sequence identification, read pairs aligning with specific bacteria and viruses at the species level were extracted by KrakenTools (v1.2) [32]. Afterwards, candidate reads were searched against our viral and bacterial database by BLAST (v.2.10.0) [33]. Viral and bacterial reads were counted only if they met two requirements as follows: (1) reads had an e-value < 1e-5 and alignment length ≥ 80 bp; (2) read pairs were both aligned to only one species. Reads were then annotated at the species level. Moreover, for the identified non-pig origin viruses and bacteria were considered false positives due to interference from the environment and laboratory components. Additionally, we only focused on one non-pathogenic virus (PERV), along with other pathogenic viruses and bacteria in this study.
Estimating abundance and statistical analysis
Since bacteria and viruses are haploid, viral and bacterial abundance was estimated by the following Eq [9].
Based on the results of viral and bacterial abundance, statistical analysis of viral and bacterial abundance were compared using the Kruskal-Wallis rank sum test. To investigate further which comparisons were significant, the Dunn test with Bonferroni correction was performed.
SNP calling and population genetic analysis
For all the high-quality mapped reads, the Genome Analysis Toolkit (GATK, v4.0.12.0) [34] and SAMtools (v1.9) were used to remove duplicated reads and sort the alignment results. Then, to obtain the high-quality SNPs, we adopted the GATK HaplotypeCaller best practice. SNPs were filtered using the VariationFiltration module in GATK, according to the following criteria: (1) approximate read depth > 10×; (2) variant confidence/unfiltered depth of non-reference samples (QD) > 2.0; (3) RMS mapping quality (MQ) > 40.0; (4) Phred-scaled P value using Fisher’s exact test to detect strand bias < 60.0; (5) Z-score from the Wilcoxon rank sum test of Alt vs. Ref read MQs (MQRankSum) > − 12.5; and (6) Z-score from the Wilcoxon rank sum test of Alt vs. Ref read position bias(ReadPosRankSum) > − 8.0; (7) strand bias was estimated by the symmetric odds ratio test (SOR) < 3.0; and (8) no more than three SNPs were clustered in a 10-bp window.
Based on the whole-genome SNP, pairwise genetic distances were calculated using the identical-by‐state (IBS) relationship matrix using Plink (v1.9) [35]. The NJ phylogenetic tree was built according to the IBS relationship matrix using PHYLIP (v3.698) [36]. Then, the NJ tree was visualized by Figtree (v1.4.4) (https://github.com/rambaut/figtree).
DNA extraction, PCR and qPCR
The genomic DNA of the Chinese domestic pigs was isolated according to the instructions of the TIANamp Genomic DNA Kit (TIANGEN, Beijing, China) and PCR was performed with primers specific for gag, pol, and env genes (See Additional file 1, Table S6) [37, 38]. PCR was conducted as described below: 1 µl of each primer (10 µM), 1 µl of genomic DNA (about 80 ng of DNA), 12.5 µl 2×EasyTaq PCR SuperMix (Trans) and 4.5 µl ddH2O. Thermocycling was done for 30 cycles at 58℃ annealing temperature and one minute extension time.
In parallel, to investigate the copy numbers of eight Chinese domestic pig breeds, we used a Real-Time SYBR qPCR Mix Kit (TAKARA) with PERV-pol specific primer pairs, for calculation of PERV copy number, the TFRC gene was amplified simultaneously with pol amplification. The copy number of PERV was calculated by Bio-Rad software (v4.1).
Assembly and phylogenetic analysis of PPV6
Read pairs aligned to PPV6 were extracted and genome coverage was calculated by BWA (v0.7.18) [39] and SAMtools (v1.9). To ensure the reliability of the assembly results, two filtering criteria were applied: (1) Individuals with genome coverage greater than 99% were retained. (2) For each breed, the individual with the most aligned reads was selected. Then, de novo assembly of PPV6 was performed using SPAdes (v3.14.1) [40].
Based on the assembled PPV6 sequences of Chinese indigenous pig breeds and PPV6 reference genome (GCF_013087245.1), multiple sequence alignment was performed using MUSCLE (v3.8.31) [41]. Then, phylogenetic trees were inferred using the maximum likelihood method implemented by IQ-TREE (v1.6.12) [42].
GWAS for the abundance of the viruses
PLHV1, PLHV3, and PVMV, the three most prevalent pathogenic viruses, were selected for identifying susceptibility-related genes. For each virus, we separately extracted the SNP of infected individuals. The SNP data was filtered by the following procedures: (1) SNP call rate < 0.95; (2) minor allele frequency (MAF) < 0.01; (3) Hardy–Weinberg equilibrium p-value was < 10 − 6; and (4) SNPs were located in sex chromosomes.
To explore the genetic susceptibility variants for virus infection, GWAS of the abundance of three viruses were separately performed using the single-trait linear mixed model by GEMMA (v0.98.5) [43]. The equation for the GWAS model was as follows:
where y was the vector of the abundance of viruses (log10-transformed); Variable X was the incidence matrix for non-genetic fixed effects, and β was the vector of non-genetic fixed effects, including overall mean, breed, and the first ten principal components; Y was the vector of SNP genotype indicators, which was coded as 0, 1, and 2, corresponding to the three genotypes AA, AB, and BB with B being the minor allele; γ was the vector of the effect of the marker; Variable Z was the incidence matrix for a vector of polygenic effects, and α was the vector for residual polygenic effects with an assumed distribution \(\alpha \sim N(0,G\sigma _a^2)\), where \(\:{\sigma\:}_{a}^{2}\) was the additive genetic variance, and \(\:\mathbf{G}\) was the marker inferred kinship matrix. While \(\:\mathbf{e}\) was the vector for random residual errors with a putative distribution \(e \sim N(0,I\sigma _e^2)\), where I was the identity matrix and \(\:{\sigma\:}_{e}^{2}\) was the residual variance. To avoid potential false positives in multiple comparisons, the Wald statistic was employed to examine the significance of the SNP. The threshold P-value after the Bonferroni correction was 1/N [44], where N is the number of independent SNPs. Independent SNPs were calculated by Plink (v1.9) through all autosomal SNP and pruned using the indep-pairwise option with a window size of 50 SNP, a step of 5 SNP, and an r2 threshold of 0.1.
The P-value of results was visualized by Manhattan plots and quantile-quantile (Q-Q) plots using the CMplot package [45] in R. To avoid missing true hints of linkage, we separately extracted the 100 kb regions upstream and downstream of each significant SNP and considered them as candidate regions related to susceptibility. Genes in these candidate regions were annotated through the Ensembl release 110 database. Moreover, we investigated the linkage disequilibrium blocks between the candidate genes and significant SNPs using Haploview (v4.1) [46].
Data availability
The whole-genome sequencing data of Chinese indigenous pigs have been made publicly available in our previously published study. The whole-genome sequencing data of European pigs were downloaded from public database (See Additional file 1, Table S2). The authors declare that this study was based on publicly available data and did not involve animal experiments.
Abbreviations
- EBNA3:
-
Epstein-Barr nuclear antigen 3
- EBV:
-
Epstein-Barr Virus
- ELISA:
-
Enzyme-linked immunosorbent assay
- GWAS:
-
Genome-wide association study
- NJ:
-
Neighbor-joining
- PCMV:
-
Porcine cytomegalovirus
- PCR:
-
Polymerase chain reaction
- PERV:
-
Porcine endogenous retrovirus
- PLHV:
-
Porcine lymphotropic herpesvirus
- PPV:
-
Porcine parvovirus
- PTTV:
-
Porcine torque teno virus
- qPCR:
-
Quantitative Polymerase Chain Reaction
- SNP:
-
Single nucleotide polymorphisms
- WGS:
-
Whole-genome sequencing
References
Lunney JK, Van Goor A, Walker KE, Hailstock T, Franklin J, Dai C. Importance of the pig as a human biomedical model. Sci Transl Med. 2021;13:eabd5758.
Wilkinson S, Lu ZH, Megens H-J, Archibald AL, Haley C, Jackson IJ, et al. Signatures of diversifying selection in European pig breeds. PLoS Genet. 2013;9:e1003453.
VanderWaal K, Deen J. Global trends in infectious diseases of swine. Proc Natl Acad Sci. 2018;115:11495–500.
Rascovan N, Duraisamy R, Desnues C. Metagenomics and the human Virome in asymptomatic individuals. Annu Rev Microbiol. 2016;70:125–41.
Edington N, Plowright W, Watt R. Generalized Porcine cytomegalic inclusion disease: distribution of cytomegalic cells and virus. J Comp Pathol. 1976;86:191–202.
Zimmerman JJ. Diseases of swine. Wiley; 2012.
Beaurepaire A, Piot N, Doublet V, Antunez K, Campbell E, Chantawannakul P, et al. Diversity and global distribution of viruses of the Western honey bee, Apis mellifera. Insects. 2020;11:239.
Li N, Li C, Hu T, Li J, Zhou H, Ji J, et al. Nationwide genomic surveillance reveals the prevalence and evolution of honeybee viruses in China. Microbiome. 2023;11:6.
Guo J, Huang X, Zhang C, Huang P, Li Y, Wen F, et al. The blood Virome of 10,585 individuals from the ChinaMAP. Cell Discovery. 2022;8:113.
Moustafa A, Xie C, Kirkness E, Biggs W, Wong E, Turpaz Y, et al. The blood DNA Virome in 8,000 humans. PLoS Pathog. 2017;13:e1006292.
Xiang S, Ma Y, Yan Q, Lv M, Zhao X, Yin H, et al. Construction and characterization of an infectious replication competent clone of Porcine endogenous retrovirus from Chinese miniature pigs. Virol J. 2013;10:1–9.
Goerlich CE, Singh AK, Griffith BP, Mohiuddin MM. The immunobiology and clinical use of genetically engineered Porcine hearts for cardiac xenotransplantation. Nat Cardiovasc Res. 2022;1:715–26.
Niu D, Wei H-J, Lin L, George H, Wang T, Lee I-H, et al. Inactivation of Porcine endogenous retrovirus in pigs using CRISPR-Cas9. Science. 2017;357:1303–7.
Denner J. Virus safety of xenotransplantation. Viruses. 2022;14:1926.
Styles CT, Paschos K, White RE, Farrell PJ. The cooperative functions of the EBNA3 proteins are central to EBV persistence and latency. Pathogens. 2018;7:31.
Kalchschmidt JS. Molecular insights into host gene regulation by Epstein-Barr virus nuclear antigen EBNA. Volume 3 C. Imperial College London; 2016.
Isler JA, Skalet AH, Alwine JC. Human cytomegalovirus infection activates and regulates the unfolded protein response. J Virol. 2005;79:6890–9.
Qian Z, Xuan B, Chapa TJ, Gualberto N, Yu D. Murine cytomegalovirus targets transcription factor ATF4 to exploit the unfolded-protein response. J Virol. 2012;86:6712–23.
Drew T. The emergence and evolution of swine viral diseases: to what extent have husbandry systems and global trade contributed to their distribution and diversity? Revue scientifique et Technique-OIE. 2011;30:95.
Xuan B, Qian Z, Torigoi E, Yu D. Human cytomegalovirus protein pUL38 induces ATF4 expression, inhibits persistent JNK phosphorylation, and suppresses Endoplasmic reticulum stress-induced cell death. J Virol. 2009;83:3463–74.
Hernandez L, Hoffman L, Wolfsberg T, White J. Virus-cell and cell-cell fusion. Annu Rev Cell Dev Biol. 1996;12:627–61.
Jocher G, Grass V, Tschirner SK, Riepler L, Breimann S, Kaya T, et al. ADAM10 and ADAM17 promote SARS-CoV‐2 cell entry and Spike protein‐mediated lung cell fusion. EMBO Rep. 2022;23:e54305.
Khasnis SJ. Investigating the regulation of B cell growth and survival genes by. Epstein-Barr virus: University of Sussex; 2019.
Kalchschmidt JS, Gillman AC, Paschos K, Bazot Q, Kempkes B, Allday MJ. EBNA3C directs recruitment of RBPJ (CBF1) to chromatin during the process of gene repression in EBV infected B cells. PLoS Pathog. 2016;12:e1005383.
Du H, Zhou L, Liu Z, Zhuo Y, Zhang M, Huang Q, et al. The 1000 Chinese Indigenous pig genomes project provides insights into the genomic architecture of pigs. Nat Commun. 2024;15:10137.
Du H, Diao C, Zhuo Y, Zheng X, Hu Z, Lu S, et al. Assembly of novel sequences for Chinese domestic pigs reveals new genes and regulatory variants providing new insights into their diversity. Genomics. 2024;116:110782.
Harris I, Osborn TJ, Jones P, Lister D. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci Data. 2020;7:109.
Martin MJEj. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–2.
Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9:357–9.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and samtools. Bioinformatics. 2009;25:2078–9.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with kraken 2. Genome Biol. 2019;20:1–13.
Lu J, Rincon N, Wood DE, Breitwieser FP, Pockrandt C, Langmead B, et al. Metagenome analysis using the kraken software suite. Nat Protoc. 2022;17:2815–39.
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36:W5–9.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Felsenstein J. PHYLIP (phylogeny inference package), version 3.5 c. Produced and distributed by author/Department of Genetics, University of Washington. 1993.
Liu G, Li Z, Pan M, Ge M, Wang Y, Gao Y, editors. Genetic prevalence of porcine endogenous retrovirus in chinese experimental miniature pigs. Transplantation proceedings; 2011: Elsevier.
Ferreira ID, Do Rosário VE, Cravo PV. Real-time quantitative PCR with SYBR green I detection for estimating copy numbers of nine drug resistance candidate genes in plasmodium falciparum. Malar J. 2006;5:1–6.
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–60.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74.
Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–4.
Du H, Liu Z, Lu S-Y, Jiang L, Zhou L, Liu J-F. Genomic evidence for human-mediated introgressive hybridization and selection in the developed breed. BMC Genomics. 2024;25:331.
Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, et al. rMVP: a memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. Genomics Proteom Bioinf. 2021;19:619–28.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.
Funding
This work was financially supported by the Biological Breeding-National Science and Technology Major Project (2023ZD0407003), the National Natural Science Foundations of China (32272844 and 32302708), Science and Technology Program of Guizhou Province [Qian Kehe Support (2022) Key 032], the Earmarked Fund for China Agriculture Research System (Grant No. CARS-pig-35), the 2115 Talent Development Program of China Agricultural University, and Chinese Universities Scientific Fund (2023TC196). We would like to thank High-performance Computing Platform of China Agricultural University for computing supporting.
Author information
Authors and Affiliations
Contributions
J-F.L. conceived and designed the study, directed the project, provided the data and computational resources, supervised bioinformatic and statistical analyses, and revised the manuscript. Z.L. designed the analytical strategy, performed analysis processes, and wrote the manuscript. S-Y.L. participated in PCR and qPCR experiments. S-J.M. provided support in downloading climate data. L.Z. and W-Y.L. provided substantial comments and revised the manuscript. H.D. supervised bioinformatic and statistical analyses and revised the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Ethics statement
The authors declare that this study was based on publicly available data and did not involve animal experiments.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, Z., Lu, SY., Ma, SJ. et al. Genomic evidence of the blood virome and bacteriome provides insights into prevalence, evolution, and susceptibility-related genes across Eurasian pigs. BMC Genomics 26, 413 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11623-9
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11623-9