In-silico analysis of deleterious non-synonymous SNPs in the human AVPR1a gene linked to autism

Jibon, Md. Delowar Kobir; Islam, Md. Asadul; Hosen, Md. Eram; Faruqe, Md. Omar; Zaman, Rashed; Acharjee, Uzzal Kumar; Sikdar, Biswanath; Tiruneh, Yewulsew Kebede; Khalekuzzaman, Md.; Jawi, Motasim; Zaki, Magdi E. A.

doi:10.1186/s12864-025-11655-1

Research
Open access
Published: 15 May 2025

In-silico analysis of deleterious non-synonymous SNPs in the human AVPR1a gene linked to autism

Md. Delowar Kobir Jibon¹,
Md. Asadul Islam¹,
Md. Eram Hosen²,
Md. Omar Faruqe³,
Rashed Zaman¹,
Uzzal Kumar Acharjee¹,
Biswanath Sikdar¹,
Yewulsew Kebede Tiruneh⁴,
Md. Khalekuzzaman¹,
Motasim Jawi⁵ &
…
Magdi E. A. Zaki⁶

BMC Genomics volume 26, Article number: 492 (2025) Cite this article

78 Accesses
Metrics details

Abstract

Single nucleotide polymorphisms are the most prevalent type of DNA variation occurring at a single nucleotide within the genomic sequence. The AVPR1a gene exhibits genetic polymorphism and is linked to neurological and developmental problems, including autism spectrum disorder. Due to the difficulties of studying all non-synonymous single nucleotide polymorphisms (nsSNPs) of the AVPR1a gene in the general population, our goal is to use a computational approach to identify the most detrimental nsSNPs of the AVPR1a gene. We employed several bioinformatics tools, such as SNPnexus, PROVEAN, PANTHER, PhD-SNP, SNP & GO, and I-Mutant2.0, to detect the 23 most detrimental mutants (R85H, D202N, E54G, H92P, D148Y, C203G, V297M, D148V, S182N, Q108L, R149C, G212V, M145T, G212S, Y140S, F207V, Q108H, W219G, R284W, L93F, P156R, F136C, P107L). Later, we used other bioinformatics tools to perform domain and conservation analysis. We analyzed the consequences of high‑risk nsSNPs on active sites, post-translational modification (PTM) sites, and their functional effects on protein stability. 3D modeling, structure validation, protein-ligand binding affinity prediction, and Protein-protein docking were conducted to verify the presence of five significant substitutions (R284W, Y140S, P107L, R149C, and F207V) and explore the modifications induced due to these mutants. These non-synonymous single nucleotide polymorphisms can potentially be the focus of future investigations into various illnesses caused by AVPR1a malfunction. Employing in-silico methodologies to evaluate AVPR1a gene variants will facilitate the coordination of extensive investigations and the formulation of specific therapeutic approaches for diseases associated with these variations.

Peer Review reports

Introduction

Globally, the human genome is approximately 99.9% identical, with individual genetic variances making up the remaining 0.1%. These genetic differences arise from random mutations [1]. The most ubiquitous kind of genetic variation in humans is represented by single-nucleotide polymorphisms (SNPs), an invaluable resource for deciphering complicated genetic features [2]. Missense mutations, also known as non-synonymous single nucleotide polymorphisms (nsSNPs), have the potential to induce phenotypic diversity in humans through modifications in protein expression [3]. Prior research suggests that non-synonymous single nucleotide polymorphisms (nsSNPs) contribute to around 50% of the mutations linked to different genetic disorders [4]. Substituting amino acids in conserved regions can affect the structure, stability, and function of proteins. Non-synonymous single nucleotide polymorphisms (nsSNPs) have the potential to alter the function of proteins, which in turn can elevate susceptibility to human diseases [5]. Autism spectrum disorder (ASD) is a severe neuropsychiatric illness that has strong hereditary underpinnings. Nevertheless, the genetic variables that contribute to autism are quite diverse, with several loci fulfilling distinct functions in various individuals [6].

Autism is a neurodevelopmental condition caused by several genes, with more than 90% of cases being influenced by genetics [7]. Arginine vasopressin (AVP) is an endogenous ligand that spontaneously binds to and stimulates AVPR1 A receptors in both the peripheral and central nervous systems. The AVPR1 A, or arginine vasopressin receptor 1 A, has a profound influence on behaviors such as forming pair bonds, providing parental care, displaying aggression, and managing stress [8,9,10,11]. This receptor plays a crucial function in brain signaling. Pharmacological approaches and the examination of various animal models have demonstrated the benefits of understanding the role of AVPR1 A in behavior [12, 13]. AVP receptors have seven transmembrane domains and are categorized as G-protein-coupled receptors. At least three types of vasopressin receptors (V1R/V1a, V2R, and V3R/V1b) have been found in humans. AVPR1a, located on chromosome 12q14-15, is especially relevant to human behavioral research. This is because the specific patterns of V1a receptor gene expression in the brain play a significant role in the observed variations in social and reproductive behavior within and between species. The Vole model has demonstrated this relationship [14,15,16].

Preclinical research has demonstrated that arginine vasopressin (AVP) enhances some social behaviors, such as association and connection, through interacting with the V1a receptor (AVPR1 A) in the brain. The effects of AVP on behavior and the location of the V1a receptor in the brain differ significantly among different mammalian species [17]. This suggests that the AVPR1a gene is a probable candidate for susceptibility to autism [18]. Previous studies investigating familial ties have demonstrated a strong association between the AVPR1 A gene and autism [19]. The presence of two microsatellite polymorphisms, RS1 and RS3, in the vicinity of the promoter region of AVPR1 A, which codes for the receptor subtype primarily responsible for regulating behavior, has been linked to autism and behavioral traits [20, 21]. The severity of autistic traits can be significantly influenced by a single nucleotide polymorphism (SNP) of the AVPR1a gene [22]. The AVPR1a gene encodes the vasopressin V1a receptor, one of the primary receptors for arginine vasopressin (AVP). A low arginine vasopressin (AVP) concentration level in cerebrospinal fluid (CSF) is an indicator of social impairment in monkeys with low social behavior and autistic children [23]. An extensive association study was conducted involving 3 microsatellites and twelve tag single nucleotide polymorphisms (SNPs) situated within and near the AVPR1 A gene in 205 Finnish families. This was followed by an assessment of the gene’s promoter, which revealed a significant correlation with autism [24]. A study was undertaken in the Korean population to evaluate the relationship between autism spectrum disorder and changes in the AVPR1 A promoter region. The study used a family-based association test (FBAT) for this purpose. The results suggest that alterations in the AVPR1 A promoter region may have a role in the development of ASD and the regulation of AVPR1 A expression [25]. Here, we explored several computational approaches to pin down non-synonymous polymorphisms in the human AVPR1 A gene.

Materials and methods

The overall workflow of this project is shown in Fig. 1.

Retrieval of SNPs

A total of 402 nsSNPs associated with the human AVPR1a gene were retrieved from the dbSNP database (https://www.ncbi.nlm.nih.gov/). We collected information on SNPs, including SNP ID, protein accession number, location, residue alteration, and global minor allele frequency (MAF) [26]. The AVPR1a gene sequence was sourced from Uniprot (https://www.uniprot.org). Studies investigated the harmful effects of missense SNPs on the AVPR1a gene.

GeneMANIA to understand AVPR1a interactions with other genes

GeneMANIA (https://genemania.org/) was used to investigate the relationship between the AVPR1a gene and other genes based on pathways, expression, localization, genetics, and protein interaction. This tool confirms the connective network between the AVPR1a gene and other genes [27].

Screening of deleterious nsSNPs

We employed two different bioinformatics tools to evaluate the likely impact of genetic variations extracted from the dbSNP databases. The tools mentioned above used: SNPnexus (https://www.snp-nexus.org) includes Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping (PolyPhen) [28]. SIFT predicts harmful nsSNPs by examining protein homology sequences and natural nsSNP alignments. A score below 0.05 indicates that SIFT considers the nsSNPs to have a deleterious effect on protein function [29]. PolyPhen-2 predicts the functional impact of amino acid substitutions on protein structure and function using sequence-based characterization [30]. PolyPhen generates a position-specific independent count (PSIC) score for each amino acid variant. Differences in PSIC scores for variants indicate their direct functional impact [31, 32]. PROVEAN is another tool we used to screen deleterious nsSNPs. PROVEAN predicts the functional impact of variants. A threshold value of ≥ −2.5 indicates a deleterious nsSNP [33].

Confirmatory analysis of the deleterious nsSNPs

We cross-checked our screened nsSNPs with another three bioinformatics tools to reconfirm the level of severity and deleterious nature. The biological and evolutionary information for every protein-coding gene is compiled in PANTHER (http://pantherdb.org) [34]. PPh-2 (http://genetics.bwh.harvard.edu/pph2) predicts how point mutations affect protein expression [35]. Mutpred2 (http://mutpred.mutdb.org/) is used to assess, using molecular and biological data, the possible structural consequences of nsSNPs arising from alterations in proteins [36].

Screening of disease-associated SNPs

To examine the association of screened nsSNPs with a disease, PhD-SNP, SNPs&GO, and Meta-SNP were performed. In order to categorize an SNP’s effect as either disease-related or neutral, the PhD-SNP tool (https://snps.biofold.org/phdsnp/phd-snp.html) generates an accuracy index score from 36,000 benign and harmful SNVs. It was developed and verified using the ClinVar dataset [37]. SNPs&GO (https://snps-and-go.biocomp.unibo.it/snps-and-go) assesses changes in amino acids at a particular location in a protein [38]. SNPs&GO and PhD-SNP are pivotal approaches based on machine learning that leverage comparative conservation scores derived from multiple sequence alignments [39]. In Meta-SNP (https://snps.biofold.org/meta-snp), the outputs from individual predictors are combined as input, and disease occurrence is predicted if mutations surpass a threshold of 0.5 [40].

Functional effects of SNPs on protein stability

To determine the changes in protein stability, we used three different tools: MUpro, I-Mutant 2.0, and INPS3D. Protein stability assessment is commonly conducted using the MUpro server (http://mupro.proteomics.ics.uci.edu). This web server is built using two machine learning techniques: Support Vector Machines (SVM) and Neural Networks. These techniques assess how single-site changes in amino acids affect the stability of proteins and display the results as a rise or fall, denoted by positive or negative scores [41]. The neural network technique is employed by the I-Mutant 2.0 web server (https://folding.biofold.org/i-mutant/i-mutant2.0.html). It is applied to predict potential changes in protein stability after mutations. A reliability index (RI) of 0 to 10, with 10 denoting the maximum dependability, is used to make predictions. The server also assesses the degree of protein instability and gives a free energy change number (ΔΔG) that shows if stability will rise or fall. A protein stability decrease is indicated by a ΔΔG value less than 0, whereas an increase in protein stability is suggested by a value greater than 0 [42]. Protein stabilities can be predicted for both wildtype and mutant variants using a recently developed tool named INPS3D. The INPS-MD (Impact of Non-synonymous mutations on Protein Stability—Multi Dimension) web server (https://inpsmd.biocomp.unibo.it/inpsSuite/default/index3D) was utilized for this purpose. This tool takes into account several variables, including the molecular weights and hydrophobicities of the native and mutated amino acids, the alignment score difference, the likelihood of the original residue undergoing mutation, the relative solvent accessibility (RSA) of the original amino acid, and the local energy difference between the wildtype and altered protein structures [43].

Domain analysis of AVPR1a

We utilized a widely used computational tool, InterPro (https://www.ebi.ac.uk/interpro/), to identify the functional domains of our desired protein (AVPR1a) [44]. This application uses a database of protein families, domains, and functional sites to find motifs and domains of proteins and, in turn, determine their functional characterization [45].

Conservation analysis

In order to evaluate the amino acid conservation pattern within the protein sequence, we made use of the predict protein server (https://predictprotein.org). The AVPR1a protein’s single-letter amino acid sequence was submitted for evaluation. More than thirty tools are integrated with this service, including ConSurf and other techniques for finding functional areas. Evolutionary conservation was analyzed using Bayesian empirical inference [46].

Predictions of ligand binding sites

The meta-server program COACH (http://zhanglab.ccmb.med.umich.edu/COACH/) used two comparison techniques, TM-SITE and S-SITE, to find ligand binding templates from the BioLiP protein function database in order to predict protein-ligand binding sites. Additionally, sequence feature correlations and binding-specific sub-structure were used. In order to anticipate ligand binding sites (LBS), COACH employs a consensus approach by combining the predictions from several algorithms, including TM-SITE, S-SITE, COFACTOR, FINDSITE, and ConCavity. Cluster size, PDB hits, ligand names, consensus binding residues, and downloadable complex structures are the factors used by the COACH server to select the top ten models. Each model is then given a C- scores. The expected reliability is shown by the C-score, which has a range of 0 to 1. Higher scores correspond to higher reliability [47].

Prediction of post translational modification (PTM) site

The neural network-based and frequently used program which called NetPhos 3.1 (https://services.healthtech.dtu.dk/services/NetPhos-3.1/) was used to estimate the probable phosphorylation sites of the AVPR1a protein. If a threshold score is more than 0.5, it suggests that a certain location is probably phosphorylated [48]. In order to forecast probable locations of MHC-binding sites, we employed GPS-MBA 1.0 (https://mba.biocuckoo.org/) [49]. To identify potential SUMOylation and ubiquitylation sites, we used GPS-SUMO (https://sumosp.biocuckoo.org/) and GPS-Uber (http://gpsuber.biocuckoo.cn/wsresult.php) [50, 51].

3D modeling

The native structure of the AVPR1a protein was downloaded from the AlphaFold protein structure database (AlphaFold DB, https://alphafold.ebi.ac.uk/) [52]. AlphaFold2 predicted the rest of the mutant protein structure [53]. The protein sequences of mutants were modified according to the substitution of amino acid positions. In order to minimize steric clashes, obtain precise side-chain locations, and eliminate distracting stereochemical violations without compromising accuracy, we employ gradient descent in the Amber force field through AlphaFold2 to predict relaxed structure [54]. The ModRefiner tool (http://zhanglab.ccmb.med.umich.edu/ModRefiner) was utilized to refine the predicted structure [55].

Structural validation and RMSD calculation

The selected structural model was validated using the widely accepted server SAVES v6.0 (https://saves.mbi.ucla.edu). This site offers tools like PROCHECK and ERRAT to assess the overall quality of the 3D model [56]. Furthermore, the RAMACHANDRAN plot produced by PROCHECK was used to evaluate the model’s quality [57]. The alignment of a protein’s tertiary structure with its primary structure is evaluated by 3D verification [58]. We utilized the pyMOL tool (https://pymol.org/2) to compute the root-mean-square deviation (RMSD) by superimposing the native and mutant protein structures, representing the difference between the two compared models. A higher RMSD value indicates a greater deviation between the two structures. On a scale ranging from 0 to 1, the TM-score evaluates the structural similarity of two models; a score of 1 denotes total similarity, while lower values suggest growing dissimilarity [59]. Afterward, the template modeling score (TM-score) was calculated by comparing the wild-type protein structure with mutant protein structures using TM-align (https://zhanglab.ccmb.med.umich.edu/TM-align) [60].

Protein-ligand interaction analysis

We conducted docking of all chosen ligands with AVPR1a using the PyRx program (https://pyrx.sourceforge.io) [61]. Virtual ligand screening was carried out using the Lamarckian genetic algorithm (LGA), which combines AutoDock and AutoDock Vina [62]. By applying AutoDock tools to convert PDB files to Pdbqt format and ascertain binding affinities. The grid size was modified as per the center (XYZ axis). The grid box center remained at coordinates X: −7.4813, Y: 4.9867, Z: 10.6823, with dimensions set to X: 109.4936, Y: 98.4684, and Z: 130.9739 Å [63]. Stronger ligand binding ability with the target receptor is indicated by negative values of the binding affinities of the ligands to the receptors, which were computed in kcal/mol [64]. Discovery Studio (https://discover.3ds.com/discovery-studio-visualizer-download) was utilized to visualize 2D and 3D interactions between ligands and proteins. It depicted the position and size of binding sites, nonbonding interactions, bonding angles and lengths of a docked ligand [65].

Analyze docking results of protein-protein complex by ClusPro

We utilized the ClusPro web server (https://cluspro.org) to conduct protein-protein docking analysis. This tool is extensively employed for studying protein-protein docking interactions. ClusPro offers various sophisticated options to tailor the search procedure, such as removing unstructured protein regions, applying attraction or repulsion forces, considering pairwise distance constraints, producing homo-multimers, incorporating data from small-angle X-ray scattering (SAXS), and locating heparin-binding sites. Based on the type of protein, six different energy functions are accessible. Ten models, each with a center of densely packed clusters of low-energy docked structures, are produced by docking with each set of energy parameters [66].

Molecular dynamics (MD) simulation

The protein–ligand complexes were subjected to MD simulations using GROMACS [67] and the WebGro server (https://simlab.uams.edu/). The ligand topology files were generated using the PRODRG Server [68], with a triclinic simulation box employed for system setup. The complexes were solvated using the SPC water model, and the system was neutralized by adding 0.15 M NaCl. The simulations were performed using the GROMOS96 43a1 force field. An initial energy minimization was carried out with 5000 steps of the steepest descent algorithm. Subsequently, the system was equilibrated under NVT and NPT ensembles with standard parameters, maintaining a temperature of 300 K and a pressure of 1.0 bar. Using the Leap-frog MD integrator, the MD trajectories were generated over a 200 ns timescale, with trajectory snapshots taken every 0.1 ns, yielding 2000 frames for analysis. The trajectory snapshots were subsequently analyzed to determine Rg, RMSD, RMSF, and SASA [67, 69].

Results

Download datasets of interest

The SNPs of the AVPR1a gene were acquired from the dbSNP database, which is widely considered the most actively utilized and comprehensive database currently accessible. According to the NCBI dbSNP database, the human AVPR1a gene displayed a sum of 4190 single nucleotide polymorphisms (SNPs). Among the entire collection, there were 402 non-synonymous SNPs (nsSNPs) (Table S1), 177 synonymous SNPs, 1625 SNPs placed in the 3’ UTR, 168 SNPs in the 5’ UTR, and 893 SNPs in intronic regions. The remaining SNPs were classified into various categories (Fig. 2). Only the non-synonymous single nucleotide polymorphisms (nsSNPs) were selected for this investigation.

GeneMANIA to understand AVPR1a interactions with other genes

GeneMANIA efficiently analyzes the other genes related to the AVPR1a gene. The graphical representation of the analysis is illustrated in Fig. 3. These findings suggest that the AVPR1a gene may have a functional connection to the co-expressed genes and could be involved in common biological pathways. So, if any mutation occurs in the AVPR1a gene, it may also affect the overall gene network interactions among all the related genes.