Deep learning tools predict variants in disordered regions with lower sensitivity

Luppino, Federica; Lenz, Swantje; Chow, Chi Fung Willis; Toth-Petroczy, Agnes

doi:10.1186/s12864-025-11534-9

Research
Open access
Published: 12 April 2025

Deep learning tools predict variants in disordered regions with lower sensitivity

BMC Genomics volume 26, Article number: 367 (2025) Cite this article

989 Accesses
6 Altmetric
Metrics details

Abstract

Background

The recent AI breakthrough of AlphaFold2 has revolutionized 3D protein structural modeling, proving crucial for protein design and variant effects prediction. However, intrinsically disordered regions—known for their lack of well-defined structure and lower sequence conservation—often yield low-confidence models. The latest Variant Effect Predictor (VEP), AlphaMissense, leverages AlphaFold2 models, achieving over 90% sensitivity and specificity in predicting variant effects. However, the effectiveness of tools for variants in disordered regions, which account for 30% of the human proteome, remains unclear.

Results

In this study, we found that predicting pathogenicity for variants in disordered regions is less accurate than in ordered regions, particularly for mutations at the first N-Methionine site. Investigations into the efficacy of variant effect predictors on intrinsically disordered regions (IDRs) indicated that mutations in IDRs are predicted with lower sensitivity and the gap between sensitivity and specificity is largest in disordered regions, especially for AlphaMissense and VARITY.

Conclusions

The prevalence of IDRs within the human proteome, coupled with the increasing repertoire of biological functions they are known to perform, necessitated an investigation into the efficacy of state-of-the-art VEPs on such regions. This analysis revealed their consistently reduced sensitivity and differing prediction performance profile to ordered regions, indicating that new IDR-specific features and paradigms are needed to accurately classify disease mutations within those regions.

Peer Review reports

Background

The recent AI breakthroughs of AlphaFold2 and Generative Pretrained Transformers (GPTs) have revolutionized biomedical research. AlphaFold2 (AF2) can now generate 3D structural models for any protein of interest. Nevertheless, there are regions in proteins that do not fold into well-defined structures (i.e. disordered regions), and thus are predicted with low confidence by AF2. Disordered regions are prevalent in the protein universe, with ~ 60% of proteins in Swiss-Prot containing Intrinsically Disordered Regions (IDRs) of minimum 10 residues long [1]. In humans ~ 30% of the proteome contains disordered regions [2, 3]. Many mutations in disordered regions are associated with diseases such as breast and ovarian cancer, cardiovascular and neurodegenerative diseases, and many others [4]. An estimated 10–20% of disease mutations occur in such regions [5, 6], underscoring the importance of assessing the impact of variants in disordered regions.

Variant Effect Predictors (VEPs) are machine learning, and more recently, deep learning models that predict the pathogenicity of genetic variants. For example, they can predict the effect of missense variants (that cause an amino acid change in the protein sequence), classifying them as either pathogenic, causal for the disease phenotype or benign, not causal for the disease. The evidence of pathogenicity assessment by VEPs is included among the American College of Medical Genetics (ACMG) [7] and the Invitae semiquantitative, hierarchical evidence-based rules for locus interpretation (Sherloc) [8] clinical guidelines, and it is considered to be an in vitro diagnostic device (IVD) according to the European regulation 2017/746 [9].

Initially, the traditional paradigm employed by VEPs to interpret variants’ pathogenicity has been based on evolutionary sequence conservation [10, 11]. Specifically, if the protein residue is conserved, i.e. not/rarely mutated in homologous proteins in other species, it is considered important in maintaining its function. For instance, features such as the number of residues observed at the mutated position in the multiple sequence alignment (MSA) of homologous sequences are used for the interpretation of variant effects. While this traditional paradigm is still used today, it has been largely superseded by more advanced models that incorporate structure-based features.

Supervised machine learning tools for variant effect prediction (VEPs) have extended the traditional paradigm of conservation by incorporating structure-based features such as the normalized surface area of residues, protein-protein interaction sites, and secondary structure annotations [12,13,14,15]. Originally, these tools were limited by their reliance on experimental structures in structural databases, such as the Protein Data Bank (PDB), which are mainly available for conserved protein domains [12, 13]. Most recent tools took advantage of AlphaFold2 structural models and successfully improved variant effect assessment [16], especially with respect to specificity, thus minimizing the false negative rate [17, 18].

In contrast to machine learning models, which require a priori knowledge and manual curation of input features, (i.e. the regressors of the model), such as residue conservation and secondary structure assignment, deep learning models perform the extraction of features as part of the training process. More recently, deep learning VEPs were developed using evolutionary information from MSAs as input, including the protein language model-based ESM1b [19], neural network-based MVP [20], and variational autoencoder-based EVE [21]. AlphaMissense [18], the most recent deep learning VEP, combines unsupervised and supervised components of computational models. The unsupervised component leverages evolutionary information, population frequency data, and structural context derived from AlphaFold2 models, and the supervised part calibrates the output of the unsupervised part on clinical data to define a probability of pathogenicity. The recent advancements in machine and deep learning models, especially AlphaMissense, and the integration of AlphaFold2 [22] structural models have significantly improved the accuracy and specificity of variant effect prediction tools, enabling more reliable assessments of the functional impact of genetic variants.

Since AlphaFold2 models are predicted with low confidence for disordered regions, the question arises if AF2-based VEPs perform reliably on variants in disordered regions. While benchmarking of VEPs on clinical variants are abundant [10, 23,24,25,26], we found few systematic benchmarks on clinical variants in disordered regions [6, 27,28,29]. Such low-complexity or disordered regions pose a challenge for conservation- and structure-based VEPs, and several studies have observed decreased sensitivity of VEPs such as PolyPhen2 and SIFT when predicting mutations in intrinsically disordered regions (IDRs) [28, 29].

The aim of this study is to evaluate whether the recent AI advances, particularly that of AlphaMissense, have led to an improvement in identifying pathogenic mutations in disordered regions. To this end, we include three different computational tools for defining disorder and both machine and deep learning VEPs to evaluate their performance on benign and pathogenic mutations from ClinVar [30]. We observe, that all state-of-the-art VEPs predict pathogenic mutations in disordered regions with lower sensitivity than in ordered ones. Their overall high performance is due to over-prediction of benign variants in disordered regions, aligning with the conservation and structural paradigm that does not account for pathogenicity in less conserved and less structured regions. This analysis offers an opportunity for future development of VEPs to design models that capture protein properties not in isolation but as dynamic units of a more complex system.

Methods

Clinical variants

We downloaded the ClinVar VCF file (version 20231217.vcf, https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/archive_2.0/2023/clinvar_20231217.vcf.gz) and the variant summary file (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/archive/2023/variant_summary_2023-12.txt.gz) and we kept only variants with either a ‘pathogenic’ or ‘benign’ clinical significance (including likely benign and likely pathogenic labels). Variants with conflicting interpretations, Variants of Uncertain Significance (VUSs) and without a clinical significance label were excluded. Further, we excluded variants which had only somatic labels.

We used MapSNPs annotation tool from PolyPhen-2 v2.2.3 [12] (http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads) to map the genome assembly hg19/GRCh37 variants’ coordinates to missense coding SNPs. Only variants mapping to known canonical transcripts according to the UCSC Genome Browser were retained. We used the PolyPhen-2 v2.2.3 pipeline to annotate features (http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads). A complete list and description of the features are available at the PolyPhen-2 v2.2.3 Wiki page (http://genetics.bwh.harvard.edu/wiki/pph2/appendix_a). Only variants in the dataset with predictions from all five computational disorder tools were annotated, resulting in a final set of 61878 variants, 23234 pathogenic and 38644 benign, mapped to 7459 proteins. On average, there are 8 variants per protein. The dataset can be downloaded from the Git repository (https://gitlab.mpi-cbg.de/tothpetroczylab/idrs_veps).

Pathogenicity scores

Pathogenicity scores were obtained using the dbNSFP47a [31, 32] command-line application, downloaded from http://database.liulab.science/dbNSFP#version. We have downloaded scores for VEST v4.0, PolyPhen-2 v2.2.3 (HVAR), REVEL, ESM1b, MVP, VARITY_R, EVE and MutPred. To calculate the accuracy, we thresholded the scores as recommended by the authors. For REVEL [33], VEST4 [34], VARITY_R [13] the cutoff is 0.5. For ESM1b [19] the threshold is -7.5 as described in their paper. For MVP a recommended score of 0.7 was used. AlphaMissense directly provides discretized predictions with three categories: benign, pathogenic and ambiguous. We excluded variants in the ambiguous category. AlphaMissense predictions were downloaded from https://console.cloud.google.com/storage/browser/_details/dm_alphamissense/AlphaMissense_aa_substitutions.tsv.gz. We considered both deep and machine learning VEPs. We defined deep learning models as models whose features extraction is automated and part of the training algorithm. Those models are AlphaMissense, MVP and ESM1b. Machine learning models, such as VEST, VARITY and PolyPhen rely on manual annotation of features. We excluded VEPs with more than 50% of missing predictions for ClinVar variants according to dbNSFP, namely EVE [21] and MutPred (http://mutpred.mutdb.org/)(Supplementary Fig. S4).

Disorder scores

We considered five different computational predictors of disorder (AIUPred [35], AlphaFold2 [22] pLDDT scores, metapredict [36, 37], AlphaFold2-RSA [38] and flDPnn [39]. For metapredict and AIUPred, residues with disorder scores greater than 0.5 were considered disordered, while for AlphaFold2 pLDDT a score lower or equal to 70 was used to identify disordered residues. For AlphaFold2-RSA the threshold recommended by the authors is 0.581 and for flDPnn is 0.3. Based on metapredict, flDPnn and AIUPred model predictions, only 4%, 5% and 1% of disordered residues are not contained within an intrinsically disordered regions (IDRs) of at least 10 residues (Supplementary Fig. S1). Consequently, we have concluded that both disorder definitions are interchangeable and we refer to variants associated with disordered residues or to variants in disordered regions as synonyms.

For both flDPnn and AIUPred, the identification of an IDR was achieved using the Savitzky-Golay filter (moving average window size = 9, polynomial degree = 3). The disorder threshold was set at 0.3 and 0.5, respectively, and consecutive disordered residues were concatenated to form IDR segments. Disorder segments shorter than 10 residues were filtered out. IDRs were divided into four distinct groups: N-terminal, C-terminal and between domains IDRs, and intrinsically disordered proteins (IDPs). The N-terminal IDR was annotated if the protein sequence started with an IDR, the C-terminal IDR corresponded to the IDR that ended with the last residue of the protein, and IDRs located between non-disordered segments were annotated as IDRs between domains. An IDP is defined as a protein whose entire sequence is annotated as an IDR.

Metapredict

Metapredict v2.6 was downloaded from the github repository available at the link https://github.com/idptools/metapredict/tree/master, the functions metapredict-predict-disorder and metapredict-predict-idrs were used. The output file was processed with a custom script to obtain the desired dataframe format with three columns: UniProt accession id, residue position and metapredict score. The metapredict score threshold to define disordered residues is 0.5. The function to predict IDRs was parsed with custom scripts to obtain a format with three columns: UniProt accession id, start residue of the IDR and end residue of the same. All the scripts used to run and parse metapredict files are available at the Git repository.

AIUPred

AIUPred was run with the command line tool downloaded from here https://aiupred.elte.hu/. The scripts used to run and parse AIUpred files are available at the Git repository. We used a threshold of 0.5 to consider a residue disordered.

AlphaFold2 pLDDT

pLDDT scores were extracted from AlphaFold2 v4 models with a custom R script. Proteins longer than 2700 residues have multiple AlphaFold2 3D models that we combined together to obtain the full-length model of the protein. Only for one protein (Q8IVF2) the full model was not successful. All the scripts used to run and parse AlphaFold2 3D models files are available at the Git repository.

AlphaFold-Relative solvent accessibility (RSA)

AlphaFold-RSA was run as a python script from the Git repository https://github.com/BioComputingUP/AlphaFold-disorder. Although not specified, we could only successfully run the script with the following version of dependencies, Pandas version 1.5.3, NumPy version 1.21.6, BioPython version 1.85 and Python version 3.10.16. We used the default parameter to run the tool, namely a window of 25 residues to smooth the RSA values. The disorder threshold recommended by the authors is 0.581.

FlDPnn

flDPnn tool was run both as a docker container following the documentation at the Git repository https://gitlab.com/sina.ghadermarzi/fldpnn_docker and with the webserver available at this link (https://biomine.cs.vcu.edu/servers/flDPnn/) for the proteins that did not successfully run with the docker. The binary prediction of disorder propensities is provided as extra column in the output file and corresponds to a cutoff of 0.3 of the disorder propensities.

VEPs benchmarking on clinvar variants

The ClinVar benchmark was performed on all mutations except for those occurring at the N-Methionine site, which were benchmarked separately. The number of mutations occurring at the N-Methionine sites is 523, however the number of variants for which all the VEPs had a prediction was 359, 334 pathogenic and 25 benign variants. The threshold to classify a variant either as benign or pathogenic is reported in the ‘Pathogenicity scores’ section. We reported sensitivity and specificity as VEPs performance metrics (Fig. 6) as well as F1-score and ROC-AUC (Supplementary Fig. S7).

The ClinVar set consisted of 61,878 variants. However, in order to make the benchmark comparable we included only the variants for which a prediction existed across all VEPs. This led to a benchmarking set of 45,316, 17,977 pathogenic and 27,339 benign variants. Because the distribution of ClinVar variants by disorder group and phenotypic effect is unbalanced, we calculated performance metrics, namely sensitivity and specificity, on 200 bootstrap samples. Each bootstrap sample consisted of 12,540 variants, sampled with replacement in each of the four combinations formed by two binary variables: pathogenic (benign) with disorder, and pathogenic (benign) with order. The resulting bootstrap sample size of ~ 50,000 variants recapitulate the ClinVar set size.

VEPs performance metrics from the confusion matrix:

		Ground-truth
		Pathogenic	Benign
Predicted	Pathogenic	TP	FP
	Benign	FN	TN

Sensitivity/Recall = TP/(TP + FN).

Specificity = TN/(TN + FP).

Precision = TP/(TP + FP).

F1-score = 2*precision*recall/(precision + recall).

For the AUC-ROC we used the R package pROC (version 1.18.5).

The statistical tests used in the analysis are the Wilcoxon and Chi-Square tests from the built-in stats R package (version 4.4.2).

Motif prediction of the heat-shock beta-1 protein of HSPB1 (P04792) using SHARK-capture

Multiple sequence alignment (MSA) of the heat-shock beta-1 protein of HSPB1 gene. The MSA was obtained with the EVcouplings [40] server (https://v2.evcouplings.org/) with default parameters and a bit score of 0.7.

Redundant sequence of P04792 MSA were removed using cd-hit (v4.8.1) with default settings (identity threshold of 90%). On these sequences, SHARK-capture (v2.0.1) was run with default settings and k-mers post-processed with extension of overlapping k-mers. K-mers mapped to the structured region were removed [41].

Results

Computational tools of disorder

In order to systematically characterize variants in disordered regions, we first identified those regions using different criteria using various tools (AlphaFold2 (AF2) pLDDT [22], AIUpred [35], metapredict [37], AlphaFold2-RSA [38] and flDPnn [39] see Methods). Those tools ranked among the best disorder predictor tools in the recent benchmark on computational disorder predictors, the Critical Assessment of protein Intrinsic Disorder prediction (CAID) [42]. AIUPred is based on biophysical principles of a sequence, pLDDT scores are derived from AF2 models, and metapredict is a metapredictor that combines several disorder prediction scores including AIUPred and AF2 pLDDT. AlphaFold2-RSA calculates the relative solvent accessibility (RSA) with DSSP [43] on AlphaFold2 3D models, while flDPnn relies on secondary structure prediction from PSIPRED [44], disorder scores predicted by IUPred [45], and evolutionary information calculated with PSI-BLAST [46].

From a sequence perspective, disordered regions are characterized by lower sequence conservation, whilst from a structural point of view they exhibit mainly flexible linkers, tails (i.e. termini) and coil elements. As an example, the multiple sequence alignment of the heat-shock protein beta-1 (P04792) shows that the alpha-crystalline domain of the protein is conserved while the N- and C-termini have low-quality alignments with many gaps (Fig. 1a). The termini are associated with higher disorder scores than the domain region, according to AF2 pLDDT, metapredict and AlphaFold2-RSA but not to AIUPred and flDPnn. While this is just one example, it reflects the generally higher propensity for disorder in the N- and C-termini of proteins, which has already been assessed [47, 48]. Furthermore, this example highlights how computational disorder predictors differ in their predictions (Figs. 1b and 2 and Supplementary Fig. S2). The N-terminus of the protein is predicted as disordered by AF2 pLDDT, metapredict and AlphaFold2-RSA but not AIUPred and flDPnn, while the C-terminus is consistently identified as disordered by all tools. This is why, we use five different computational predictors to measure disorder, i.e., AIUPred [35], AF2 pLDDT [22, 49], metapredict [36, 37], AlphaFold2-RSA [38] and flDPnn [39]. The correlation between disorder tools is high overall. The lowest correlation, -37%, is observed between AF2 pLDDT and flDPnn (Supplementary Fig. S2). In general, the lowest correlations are between flDPnn and any other disorder predictor tool. The high correlation of AF2 pLDDT and AIUPred with metapredict − 76% and 77% respectively, is expected given that metapredict includes both scores during training (Supplementary Fig. S2).

The concordance among tools is principally attributable to order predictions, with 54% of residues being jointly predicted as ordered (Fig. 2). In accordance with the low correlation with flDPnn, 15% of residues are predicted as disordered by all tools except flDPnn. Additionally, all five tools label a residue as disordered in only 7% of cases (Fig. 2). The disagreement among the tools accounts for 38% of residues, which is why the inclusion of AIUPred, AF2 pLDDT, metapredict, AlphaFold2-RSA (AF2-RSA) and flDPnn in the variants benchmark is necessary to allow robust assessment and interpretation of the results.

Variants in disordered regions are predominantly benign

In order to assess the effects of missense variants in disordered regions, we collected data from a curated clinical variant database, ClinVar (see Methods). The stratification of variants showed that residues in disordered regions are abundant, with 31%, 35%, 31%, 31% and 9% of variants falling in disordered regions according to AIUPred, AF2 pLDDT, metapredict, AF2-RSA and flDPnn respectively (Fig. 3a). The variants associated with disordered residues are predominantly benign, 81% according to AIUPred and 86% for AF2 pLDDT, AF2-RSA and flDPnn, 85% for metapredict (Fig. 3b). In addition, except for flDPnn, we confirmed that 11–15% of pathogenic mutations occur in IDRs (Fig. 3c) [5, 6]. This is in agreement with finding reduced negative selection in IDRs when analyzing polymorphisms from population data of human and yeast [51].

Those observations align with the higher tolerance of IDRs to amino acid substitutions when looking at orthologous sequences of distant species. Disordered regions tend to evolve faster [52, 53] and also tolerate more insertions and deletions than ordered domains [54]. This can be attributed to the reduced evolutionary constraints on their sequences, as they are not limited by the structural requirements that constrain ordered regions. Overall, disordered regions tolerate more polymorphisms and accumulate mainly benign mutations.

Mutations at the N-Methionine site are excessive and misclassified by VEPs

Since IDRs often flank structured domains as N- and C-terminal regions or connect them as linkers, we wanted to test if there is a bias in the occurrence and phenotypic effect of variants along the protein sequence, particularly at the termini. For this purpose, we classified IDRs in four groups, that is N-terminal, between domains and C-terminal IDR and IDP (Intrinsically Disordered Proteins) according to the definition given in the Methods section. We conducted a chi-square test to examine the association between protein regions (C-terminus, between domains, IDP and N-terminus) and the phenotypic effect (pathogenic and benign). The test was significant (p value < 1e-15) and the standardized residuals revealed large deviations from the expected count under the independence assumption between protein regions and variant effect. According to flDPnn and metapredict, pathogenic variants are enriched in the N-terminal IDR but depleted for AIUpred. For metapredict and AIUpred, pathogenic variants are as well more frequently found in the IDRs between domains while benign variants are more commonly located in C-terminal IDRs (Fig. 4). While for flDPnn, benign variants are more frequently observed in IDRs between domains and less in the C-terminus.

We found that the number of variants and their effect along protein sequences are not uniformly distributed (Fig. 5a). Strikingly, we observed that 0.85% of variants occur at the N-Methionine site (Fig. 5a). At first glance, this percentage may seem negligible. However, assuming a discrete uniform distribution for the event of mutations occurring in a protein of length N = 572, the median length of ClinVar proteins in this analysis, each site has an approximate probability of 0.17% (~ 1/600) of being mutated. In comparison, the observed frequency of 0.85% suggests a significantly higher propensity for mutations to occur at the start codon.

Variants at the starting methionine position are mainly pathogenic (93%) (Fig. 5a). Moreover, pathogenic mutations at the N-Methionine site are biased towards proteins that have a second Methionine located further away compared to proteins with benign mutations at the same site (Fig. 5b, Wilcoxon rank sum test, one-sided test with “greater” as alternative hypothesis, p-value < 2.398e-09). We performed the same analysis on proteins without mutations at the N-Methionine site, and, in this case, the distances to the second Methionine do not differ among sites with pathogenic and benign mutations (Fig. 5b). These observations suggest that a second and proximal N-terminal Methionine could compensate for the loss of the N-Methionine and perhaps serve as an alternative initiation codon [55].

Due to the unique features of the start codon, we investigated the performance of VEPs on the set of N-Methionine sites harboring pathogenic variants associated with disordered residues. Most VEPs do not perform well on mutations occurring at the N-Methionine site. In particular, we observed a strong disproportion between their sensitivity and specificity (Fig. 5c). For instance, AlphaMissense overpredicts benign variants and reaches only 29% sensitivity. VARITY overpredicts pathogenic variants reaching only 20% specificity (Fig. 5c). REVEL is the most balanced with 65% sensitivity and 84% specificity. These observations are consistent with previous work that considered variants at the first codon to be a special case of mutations, which should be analyzed with a different set of features with respect to mutations occurring anywhere else in the protein [56, 57]. For example, the position of the second Methionine in the protein sequence as well as the number of AUG codons in the 5’ UTR region of the mRNA sequence are relevant predictive features to distinguish between pathogenic and benign variants.

Accordingly, we propose that N-Methionine mutations should be treated separately as their own class of mutations because of different features that would discriminate between the pathogenic and benign variants at this position. In addition, according to metapredict and AIUPred 65% and 32% of N-Methionine sites belong to an N-terminal IDR (Supplementary Fig. S3). Since the disordered nature of the first residue might bias our investigation on VEPs, we excluded all N-Methionine mutations from all subsequent analyses.

The gap between sensitivity and specificity is highest in disordered regions

Next, we investigated whether the performance of deep and machine learning VEPs such as AlphaMissense is consistent among variants associated with disordered residues. The stratification of VEPs' performance by disorder propensity highlighted that the top VEPs exhibit higher specificity than sensitivity when predicting variants in disordered regions (Fig. 6a). This observation is consistent across different metrics of disorder. Here, we considered top or best performing tools those with the smallest difference between sensitivity and specificity both in disordered and ordered regions, namely AlphaMissense, REVEL and VARITY (Fig. 6b). High sensitivity and high specificity are especially desirable in clinical diagnostics, where both false negatives and false positives rates should be minimized. The performance of VEPs by ROC-AUC and F1-score alone may lead us to conclude that variants in disordered regions are more accurately predicted, as both ROC-AUC and F1-score values are higher or as high as variants in ordered regions (Supplementary Fig. S7). This result would mask the actual unbalanced performance by class. In fact, the gap between sensitivity and specificity is highest in disordered regions measured with flDPnn (Fig. 6b), reaching up to 20% difference. Likewise, disorder measured by AF2 pLDDT and AlphaFold2-RSA results in more than 10% increase of specificity over sensitivity.

Strikingly, the discrepancy of AlphaMissense in the AF2 pLDDT (0,50] confidence group reaches almost 20% (Supplementary Fig. S5). A clear pattern emerges: variants in disordered regions are associated with higher specificity, and mutations in ordered regions with higher sensitivity. This pattern holds true for the latest deep and machine learning models such as AlphaMissense and REVEL, while VEST4 and PolyPhen2 remain biased towards pathogenic mutations also in the disordered set (Fig. 6b and Supplementary Fig. S6).

The already known bias of VEPs to over-predict pathogenic mutations [17, 25] seems to be complemented with the reversed bias of over-predicting benign mutations in disordered regions.

State-of-the-art VEPs misclassify pathogenic mutations in IDRs

The higher specificity of VEPs for variants in disordered regions corresponds to lower sensitivity for variants in disordered regions (Fig. 6). AlphaMissense demonstrates the lowest sensitivity in predicting variants within C-terminal IDRs, a finding consistent across various definitions of IDRs (Supplementary Fig. S8). REVEL and VARITY achieve the lowest sensitivity for variants in the C-terminal IDR according to metapredict and in the C-terminal and N-terminal when IDRs are identified with AIUPred. The present observations do not reveal any consistent patterns or trends regarding the performance of VEPs across IDR groups.

As an example, we investigated a variant at the N-terminus of the heat-shock beta-1 protein encoded by the HSPB1 gene. The N-terminus harbors known germline missense mutations that are associated with Charcot-Marie-Tooth disease (CMT) and Hereditary Motor Neuropathy (HMN) type II [58]. An example of a pathogenic variant is the mutation P39L (variant id: NM_001540.5(HSPB1):c.116 C > T (p.Pro39Leu)) which increases the propensity of helix formation and/or local contacts between the N-terminal regions [59]. Further, it prevents the dissociation of large oligomers of HSPB1 by phosphorylation [60]. While both ClinVar data and significant experimental evidence support the pathogenicity of this variant, the most recent and widely used VEPs, namely AlphaMissense, REVEL and VARITY, interpret this variant as benign, with a probability of pathogenicity of 0.29, 0.20 and 0.38 respectively. Although this is only one example, we detect across the whole dataset a decreased sensitivity for mutations in disordered regions compared to ordered regions (Fig. 6), thus indicating a higher false negative rate.

We further investigated the 3D model and secondary structure of the heat-shock beta-1 protein both as monomer and multimer, whose quaternary structure has been already extensively investigated [61,62,63]. Specifically, the work of Jehle et al. [64] led to a model of the N-terminal domain structure that consists of two alpha-helices and two beta strands that might not exist simultaneously and especially not in all oligomers.

The secondary structure assignment of STRIDE [65] describes the N-terminus as containing mostly disordered/coiled regions and turns, namely structural elements of 3 to 4 residues that connect helices and beta strands (Fig. 7a). However, the dimeric (Fig. 7c) and tetrameric (Fig. 7d) AlphaFold3 [66] 3D models of the heat-shock protein showed an increase in secondary structure elements compared to the monomeric structure, confirmed by STRIDE, and underpinning the experimental results (Fig. 7). In particular, in the dimeric form we observed a longer alpha-helix than in the monomer, and in the tetramer, we noticed another helix not present in the dimer and monomer (Fig. 7c). Thus, oligomerization promotes disorder-to-order transitions, specifically the formation of helices. Note that the 3D structural models of the C-terminal part of the protein remained unaltered across the different oligomeric conformations (Fig. 7a). Thus, the mutations at residue position 39 at the tip of the helix that elongates upon oligomerization may be correctly classified if the oligomeric structural state were considered. The above structural investigation highlights that oligomeric forms of the protein contain supplemental and perhaps relevant information to aid the discrimination of variants effect in disordered regions of the protein.

To further characterize the disordered region of HSPB1, we ran SHARK-capture, an alignment-free motif detecting tool specifically designed to identify conserved motifs in disordered regions. Focusing on the disordered parts of the sequence, namely the N- and C- terminal regions, we annotated the ten highest scoring motifs and found that two pathogenic variants, including P39L, fall into these motifs. In comparison, none of the benign mutations are located in a conserved region (Fig. 7e).

While this represents only one example, it serves as a proof of concept and highlights an alternative approach to study and characterize mutations in disordered regions.

Discussion

Machine and deep learning VEPs are extensively and routinely benchmarked on well-characterized clinical variants to demonstrate their utility for clinical genetic diagnosis. The newest deep-learning VEP, AlphaMissense, predicts the pathogenicity of genetic variants with a remarkable 90% sensitivity and specificity, yet it was reported that the pathogenicity of genetic variants in disordered regions is poorly predicted [28, 29]. Here, we investigated the performance of AlphaMissense as well as other state-of-the-art VEPs on ClinVar mutations in disordered regions according to five different metrics of disorder, namely AIUPred [35], AlphaFold2 pLDDT [16], metapredict [36, 37], AlphaFold2-RSA [38] and flDPnn [39].

Our work confirmed that pathogenic mutations in disordered regions are prevalent [5, 6]: we found that more than 10% of disease mutations are associated with disordered residues, except for flDPnn (Fig. 3). Interestingly, we observed that the first N-Methionine site, which is often predicted as disordered, is the residue with the most annotated variants, and that more than 90% of mutations at this site are pathogenic (Fig. 5a and Supplementary Fig. S3). We found that the interpretation of variants effect at this site is poor across all VEPs (Fig. 5c) but it can be improved by considering the distance to the second Methionine in the protein sequence (Fig. 5b), as reported [56]. While it remains unclear why mutations at the N-Methionine site are so prevalent, we concur with previous studies that the assessment of pathogenicity at this site should be handled independently from mutations at any other site [57].

After excluding mutations at the N-Methionine site, we assessed the performance of VEPs for variants located in disordered regions. Our analysis revealed that the performance in disordered regions is more unbalanced compared to that in ordered regions, with higher specificity than sensitivity (Fig. 6). The gap between specificity and sensitivity remains large across disorder metrics; when disorder is measured with AIUPred, the sensitivity in disordered and ordered regions are not different (Fig. 6a). On the other hand, when disorder is measured with AlphaFold2 pLDDT, AlphaFold2-RSA, metapredict and flDPnn the difference between sensitivity and specificity reaches at least 10% (Fig. 6b). While this observation underscores the lack of a clear and standardized definition of disorder, all five disorder computational tools reveal a more pronounced imbalance of VEPs' performance in disordered regions.

The higher sensitivity in ordered regions corresponding to higher specificity in disordered ones suggests that AlphaMissense, like other state-of-the art VEPs, relies on the traditional conservation paradigm to interpret variant effects, where mutations in conserved domains tend to be predicted as pathogenic, whereas those in less conserved, alignable and structured regions tend to be predicted as benign. This holds true both for supervised models such as REVEL, whose features consist of pathogenicity predictions as well as conservation scores, and for unsupervised deep learning models that learn the probability of protein sequences from the protein’s specific evolutionary history represented by its multiple sequence alignment (MSA). Alignment-free models such as ESM1b that train on all available protein sequences without explicit requirement of homology are nonetheless learning evolutionary constraint of proteins [19, 67] and as such are biased towards high sensitivity in ordered regions and high specificity in disordered ones (Figs. 6 and 7). However, since the case of pathogenic mutations in disordered regions falls outside this current framework, such predictors tend to over-predict variants as benign, thereby increasing the false negative rate.

The current observations convey a significant message regarding the implications of VEPs in clinical decision-making. AlphaMissense achieves a high and balanced performance for variants in ordered and conserved regions, representing a substantial enhancement over VEPs that attain higher sensitivity at the expense of a high false positive rate (Figs. 6 and 7). However, given the performance bias observed in disordered regions, it is recommended that secondary structure information and disorder scores be reported. This would allow clinicians and researchers to further investigate the functional impact and disease relevance of variants in disordered regions using other computational tools and experiments.

The performance of VEPs is biased towards high specificity in disordered regions and may improve if features specifically designed for such regions are implemented. The case described in the results of the P39L mutation in the disordered N-terminal domain of the heat-shock beta-1 protein is an example (Fig. 7). The functional and structural importance of the N-terminus for the protein oligomerization is extensively documented and from our qualitative exploration, it seems that it might be beneficial to consider the structural conformation of the N-terminus in the dimeric and multimeric form (Fig. 7) which is now possible, but not yet scalable, with AlphaFold3 [66]. In the future, the design of features that capture residue functional importance in the tertiary and quaternary structure of a protein may further improve the accuracy of variant effect interpretation.

Conclusion

Predicting pathogenic variants in disordered regions remains challenging because the mechanisms of pathogenicity in such regions have not yet yielded a paradigm similar to the one for mutations occurring in ordered regions. Especially given their reduced sequence conservation due to the lack of an evolutionary constraint to maintain structure. Accordingly, our study highlights the lack of features for VEPs aimed specifically for disordered regions, which manifests in lowered sensitivity and a discrepancy between sensitivity and specificity for most VEPs. Ultimately, given the criticalroles of IDRs in binding events such as signaling, regulation and the formation of protein complexes, development and inclusion of IDR-specific features that describe the 3D structure of the protein as well as its interactions in protein complexes are needed to improve variant effect assessment in IDRs.

Data availability

All code and data are available at the gitlab https://git.mpi-cbg.de/tothpetroczylab/idrs_veps.

References

Chow CFW, Ghosh S, Hadarovich A, Toth-Petroczy A. SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences. Proc Natl Acad Sci U S A. 2024;121:e2401622121.
Article CAS PubMed PubMed Central Google Scholar
Pentony MM, Jones DT. Modularity of intrinsic disorder in the human proteome: disorder in the human proteome. Proteins. 2010;78:212–21.
Article CAS PubMed Google Scholar
Ruff KM, Pappu RV. AlphaFold and implications for intrinsically disordered proteins. J Mol Biol. 2021;433:167208.
Article CAS PubMed Google Scholar
Vacic V, Iakoucheva LM. Disease mutations in disordered regions - Exception to the rule? Mol Biosyst. 2012;8:27–32.
Article CAS PubMed Google Scholar
Tesei G, Trolle AI, Jonsson N, Betz J, Knudsen FE, Pesce F, et al. Conformational ensembles of the human intrinsically disordered proteome. Nature. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41586-023-07004-5.
Article PubMed PubMed Central Google Scholar
Vacic V, Markwick PRL, Oldfield CJ, Zhao X, Haynes C, Uversky VN, et al. Disease-associated mutations disrupt functionally important regions of intrinsic protein disorder. PLoS Comput Biol. 2012;8:e1002709.
Article CAS PubMed PubMed Central Google Scholar
Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, et al. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med. 2013;15:565–74.
Article CAS PubMed PubMed Central Google Scholar
Nykamp K, Anderson M, Powers M, Garcia J, Herrera B, Ho Y-Y, et al. Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria. Genet Med. 2017;19:1105–17.
Article PubMed PubMed Central Google Scholar
Regulation– 2017/746 - EN - Medical Device Regulation. - EUR-Lex. https://eur-lex.europa.eu/eli/reg/2017/746/oj. Accessed 5 Nov 2024.
Livesey BJ, Marsh JA. Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations. Mol Syst Biol. 2020;16:e9380.
Article Google Scholar
Garcia FA, de O ES, Palmero EI. Insights on variant analysis in Silico tools for pathogenicity prediction. Front Genet. 2022;13:1010327.
Article PubMed PubMed Central Google Scholar
Adzhubei Ia, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9.
Article CAS PubMed PubMed Central Google Scholar
Wu Y, Li R, Sun S, Weile J, Roth FP. Improved pathogenicity prediction for rare human missense variants. Am J Hum Genet. 2021;108:1891–906.
Article CAS PubMed PubMed Central Google Scholar
Capriotti E, Altman RB. Improving the prediction of disease-related variants using protein three-dimensional structure. BMC Bioinformatics. 2011;12 SUPPL. 4:S3.
Sunyaev S, Ramensky V, Koch I, Lathe W 3rd, Kondrashov AS, Bork P. Prediction of deleterious human alleles. Hum Mol Genet. 2001;10:591–7.
Article CAS PubMed Google Scholar
Schmidt A, Röner S, Mai K, Klinkhammer H, Kircher M, Ludwig KU. Predicting the pathogenicity of missense variants using features derived from AlphaFold2. Bioinformatics. 2023;39.
Luppino F, Adzhubei IA, Cassa CA, Toth-Petroczy A. DeMAG predicts the effects of variants in clinically actionable genes by integrating structural and evolutionary epistatic features. Nat Commun. 2023;14:2230.
Article CAS PubMed PubMed Central Google Scholar
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with alphamissense. Science. 2023;381:eadg7492.
Article CAS PubMed Google Scholar
Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein Language model. Nat Genet. 2023;55:1512–22.
Article CAS PubMed PubMed Central Google Scholar
Qi H, Zhang H, Zhao Y, Chen C, Long JJ, Chung WK, et al. MVP predicts the pathogenicity of missense variants by deep learning. Nat Commun. 2021;12:510.
Article CAS PubMed PubMed Central Google Scholar
Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, et al. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599:91–5.
Article CAS PubMed Google Scholar
Jumper J, Evans R, Pritzel A, Green T, Figurnov M et al. On ne be rg er OR,. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41586-021-03819-2
Pejaver V, Byrne AB, Feng BJ, Pagel KA, Mooney SD, Karchin R, et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria. Am J Hum Genet. 2022;109(12):2163–77.
Livesey BJ, Marsh JA. Advancing variant effect prediction using protein Language models. Nat Genet. 2023;55:1426–7.
Article CAS PubMed Google Scholar
Cubuk C, Garrett A, Choi S, King L, Loveday C, Torr B, et al. Clinical likelihood ratios and balanced accuracy for 44 in Silico tools against multiple large-scale functional assays of cancer susceptibility genes. Genet Med. 2021;23:2096–104.
Article CAS PubMed PubMed Central Google Scholar
Niroula A, Vihinen M. How good are pathogenicity predictors in detecting benign variants? PLoS Comput Biol. 2019;15:e1006481.
Article CAS PubMed PubMed Central Google Scholar
Mort M, Evani US, Krishnan VG, Kamati KK, Baenziger PH, Bagchi A, et al. In Silico functional profiling of human disease-associated and polymorphic amino acid substitutions. Hum Mutat. 2010;31:335–46.
Article PubMed Google Scholar
Zhou J-B, Xiong Y, An K, Ye Z-Q, Wu Y-D. IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions. Bioinformatics. 2020;36:4977–83.
Tordai H, Torres O, Csepi M, Padányi R, Lukács GL, Hegedűs T. Analysis of alphamissense data in different protein groups and structural context. Sci Data. 2024;11:495.
Article CAS PubMed PubMed Central Google Scholar
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42(1):D980–21775. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkt1113
Liu X, Jian X, Boerwinkle E. DbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32:894–9.
Article CAS PubMed Google Scholar
Liu X, Li C, Mou C, Dong Y, Tu Y. DbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med. 2020;12:103.
Article CAS PubMed PubMed Central Google Scholar
Ioannidis NM, Rothstein JH, Pejaver V, Middha S, Mcdonnell SK, Baheti S, et al. ARTICLE REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99:877–85.
Article CAS PubMed PubMed Central Google Scholar
Carter H, Douville C, Stenson PD, Cooper DN, Karchin R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics. 2013;14.
Erdős G, Dosztányi Z. AIUPred: combining energy Estimation with deep learning for the enhanced prediction of protein disorder. Nucleic Acids Res. 2024;52:W176–81.
Article PubMed PubMed Central Google Scholar
Emenecker RJ, Griffith D, Holehouse AS. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys J. 2021;120:4312–9.
Article CAS PubMed PubMed Central Google Scholar
Emenecker RJ, Griffith D, Holehouse AS. Metapredict V2: an update to metapredict, a fast, accurate, and easy-to-use predictor of consensus disorder and structure. BioRxiv. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2022.06.06.494887
Piovesan D, Monzon AM, Tosatto SCE. Intrinsic protein disorder and conditional folding in alphafolddb. Protein Sci. 2022;31:e4466.
Article CAS PubMed PubMed Central Google Scholar
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, et al. FlDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun. 2021;12:4438.
Article CAS PubMed PubMed Central Google Scholar
Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, et al. The evcouplings python framework for coevolutionary sequence analysis. Bioinformatics. 2019;35:1582–4.
Article CAS PubMed Google Scholar
Chow CFW, Lenz S, Scheremetjew M, Ghosh S, Richter D, Jegers C, et al. SHARK-capture identifies functional motifs in intrinsically disordered protein regions. Protein Sci. 2025;34(4):e70091. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/pro.70091. PMID: 40100159; PMCID: PMC11917139
Conte AD, Mehdiabadi M, Bouhraoua A, Miguel Monzon A, Tosatto SCE, Piovesan D. Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2. Proteins. 2023;91:1925–34.
Article CAS PubMed Google Scholar
Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–637.
Article CAS PubMed Google Scholar
Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202.
Article CAS PubMed Google Scholar
Dosztányi Z. Prediction of protein disorder based on IUPred. Protein Sci. 2018;27:331–40.
Article PubMed Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Article CAS PubMed PubMed Central Google Scholar
Lobanov MY, Furletova EI, Bogatyreva NS, Roytberg MA, Galzitskaya OV. Library of disordered patterns in 3D protein structures. PLoS Comput Biol. 2010;6:e1000958.
Article PubMed PubMed Central Google Scholar
Panda A, Tuller T. Exploring potential signals of selection for disordered residues in prokaryotic and eukaryotic proteins. Genomics Proteom Bioinf. 2020;18:549–64.
Article Google Scholar
Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci U S A. 2023;120:e2304302120.
Article CAS PubMed PubMed Central Google Scholar
Blum M, Andreeva A, Florentino LC. Chuguransky SR, Grego T, Hobbs E, et al. InterPro: the protein sequence classification resource in 2025. Nucleic Acids Res. 2025;53(D1): D444–56.
Khan T, Douglas GM, Patel P, Nguyen Ba AN, Moses AM. Polymorphism analysis reveals reduced negative selection and elevated rate of insertions and deletions in intrinsically disordered protein regions. Genome Biol Evol. 2015;7:1815–26.
Article CAS PubMed PubMed Central Google Scholar
Brown CJ, Johnson AK, Daughdrill GW. Comparing models of evolution for ordered and disordered proteins. Mol Biol Evol. 2010;27:609–21.
Article CAS PubMed Google Scholar
Tóth-Petróczy A, Tawfik DS. Slow protein evolutionary rates are dictated by surface-core association. Proc Natl Acad Sci U S A. 2011;108:11151–6.
Article PubMed PubMed Central Google Scholar
Tóth-Petróczy Á, Tawfik DS. Protein insertions and deletions enabled by neutral roaming in sequence space. Mol Biol Evol. 2013;30:761–71.
Article PubMed Google Scholar
Benitez-Cantos MS, Yordanova MM, O’Connor PBF, Zhdanov AV, Kovalchuk SI, Papkovsky DB, et al. Translation initiation downstream from annotated start codons in human mRNAs coevolves with the Kozak context. Genome Res. 2020;30:974–84.
Article CAS PubMed PubMed Central Google Scholar
Abad-Navarro F, de la Morena-Barrio ME, Fernández-Breis JT, Corral J. Lost in translation: bioinformatic analysis of variations affecting the translation initiation codon in the human genome. Bioinformatics. 2018;34:3788–94.
Article CAS PubMed Google Scholar
Castell-Diaz J, Abad-Navarro F, de la Morena-Barrio ME, Corral J, Fernandez-Breis JT. Using machine learning for predicting the effect of mutations in the initiation codon. IEEE J Biomed Health Inf. 2022;26:5750–6.
Article CAS Google Scholar
Houlden H, Laura M, Wavrant-De Vrièze F, Blake J, Wood N, Reilly MM. Mutations in the HSP27 (HSPB1) gene cause dominant, recessive, and sporadic distal HMN/CMT type 2. Neurology. 2008;71:1660–8.
Article CAS PubMed Google Scholar
Clouser AF, Baughman HE, Basanta B, Guttman M, Nath A, Klevit RE. Interplay of disordered and ordered regions of a human small heat shock protein yields an ensemble of quasi-ordered States. Elife. 2019;8.
Muranova LK, Weeks SD, Strelkov SV, Gusev NB. Characterization of mutants of human small heat shock protein HspB1 carrying replacements in the N-Terminal domain and associated with hereditary motor neuron diseases. PLoS ONE. 2015;10:e0126248.
Article PubMed PubMed Central Google Scholar
Bagnéris C, Bateman OA, Naylor CE, Cronin N, Boelens WC, Keep NH, et al. Crystal structures of alpha-crystallin domain dimers of alphaB-crystallin and Hsp20. J Mol Biol. 2009;392:1242–52.
Article PubMed Google Scholar
Jehle S, van Rossum B, Stout JR, Noguchi SM, Falber K, Rehbein K, et al. alphaB-crystallin: a hybrid solid-state/solution-state NMR investigation reveals structural aspects of the heterogeneous oligomer. J Mol Biol. 2009;385:1481–97.
Article CAS PubMed Google Scholar
Peschek J, Braun N, Franzmann TM, Georgalis Y, Haslbeck M, Weinkauf S, et al. The eye lens chaperone alpha-crystallin forms defined globular assemblies. Proc Natl Acad Sci U S A. 2009;106:13272–7.
Article CAS PubMed PubMed Central Google Scholar
Jehle S, Vollmar BS, Bardiaux B, Dove KK, Rajagopal P, Gonen T, et al. N-terminal domain of alphaB-crystallin provides a conformational switch for multimerization and structural heterogeneity. Proc Natl Acad Sci U S A. 2011;108:6409–14.
Article CAS PubMed PubMed Central Google Scholar
Heinig M, Frishman D. STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 2004;32(Web Server issue):W500–2.
Article CAS PubMed PubMed Central Google Scholar
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature. 2024;630:493–500.
Article CAS PubMed PubMed Central Google Scholar
Lamb KD, Hughes J, Lytras S, Koci O, Young F, Grove J, et al. From a single sequence to evolutionary trajectories: protein Language models capture the evolutionary potential of SARS-CoV-2 protein sequences. BioRxiv. 2024. 2024.07.05.602129.

Download references

Acknowledgements

We would like to thank the Computer Services and Scientific Computing Facilities of the MPI-CBG for their support, especially to Oscar Gonzales for supporting our HPC. The authors thank Zi Zhu for discussions on disordered regions and for proofreading the manuscript and Maxim Scheremetjew for support with disorder tools installations and docker set up.

Funding

Open Access funding enabled and organized by Projekt DEAL.

This project was funded by the Max Planck Gesellschaft and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany´s Excellence Strategy – EXC-2068–390729961 – Cluster of Excellence Physics of Life of TU Dresden.

Author information

Authors and Affiliations

Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
Federica Luppino, Swantje Lenz, Chi Fung Willis Chow & Agnes Toth-Petroczy
Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
Federica Luppino, Swantje Lenz, Chi Fung Willis Chow & Agnes Toth-Petroczy
Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany
Chi Fung Willis Chow & Agnes Toth-Petroczy

Authors

Federica Luppino
View author publications
You can also search for this author inPubMed Google Scholar
Swantje Lenz
View author publications
You can also search for this author inPubMed Google Scholar
Chi Fung Willis Chow
View author publications
You can also search for this author inPubMed Google Scholar
Agnes Toth-Petroczy
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

A.T.-P. and F.L. conception and design of the work; F.L. data analysis, S.L. data analysis and preparation of Figure 7e. C.F.W.C creation of new algorithm used in the work; All authors interpretation of data; F.L. draft of the manuscript and all authors manuscript writing and editing.

Corresponding author

Correspondence to Agnes Toth-Petroczy.

Ethics declarations

Ethics approval and consent to participate

N/A.

Consent for publication

N/A.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Luppino, F., Lenz, S., Chow, C.F.W. et al. Deep learning tools predict variants in disordered regions with lower sensitivity. BMC Genomics 26, 367 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11534-9

Download citation

Received: 14 January 2025
Accepted: 27 March 2025
Published: 12 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11534-9

Deep learning tools predict variants in disordered regions with lower sensitivity

Abstract

Background

Results

Conclusions

Background

Methods

Clinical variants

Pathogenicity scores

Disorder scores

Metapredict

AIUPred

AlphaFold2 pLDDT

AlphaFold-Relative solvent accessibility (RSA)

FlDPnn

VEPs benchmarking on clinvar variants

Motif prediction of the heat-shock beta-1 protein of HSPB1 (P04792) using SHARK-capture

Results

Computational tools of disorder

Variants in disordered regions are predominantly benign

Mutations at the N-Methionine site are excessive and misclassified by VEPs

The gap between sensitivity and specificity is highest in disordered regions

State-of-the-art VEPs misclassify pathogenic mutations in IDRs

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us