Skip to main content

Table 2 Quantitative assessment of model performance for heat Stress-Related task

From: TransGeneSelector: using a transformer approach to mine key genes from small transcriptomic datasets in plant responses to various environments

Methods

Accuracy

Precision

Recall

F1

AUC

TransGeneSelector

0.9623

0.9643

0.9643

0.9643

0.9871

TransGeneSelector (mix-up)

0.9434

0.9032

1.0000

0.9492

0.9500

TransGeneSelector (MLP)

0.9434

0.9032

1.0000

0.9492

0.9629

Random Forest with default parameter

0.9434

0.9032

1.0000

0.9492

0.9586

Random Forest with 8 genes

0.9245

0.9000

0.9643

0.9310

0.9457

Random Forest with 11 genes

0.9434

0.9032

1.0000

0.9492

0.9471

Random Forest with 41 genes

0.9245

0.9000

0.9643

0.9310

0.9464

Random Forest with 51 genes

0.9434

0.9032

1.0000

0.9492

0.9500

Random Forest with 148 genes

0.9434

0.9032

1.0000

0.9492

0.9557

Random Forest with 449 genes

0.8679

0.8889

0.8571

0.8727

0.9507

NR-LR-MCP

0.9245

0.9286

0.9286

0.9286

0.9443

SVM with default parameter

0.8302

0.8800

0.7857

0.8302

0.9429

SVM with 8 genes

0.9434

0.9032

1.0000

0.9492

0.9471

SVM with 11 genes

0.9434

0.9032

1.0000

0.9492

0.9186

SVM with 41 genes

0.9434

0.9032

1.0000

0.9492

0.9743

SVM with 51 genes

0.9434

0.9032

1.0000

0.9492

0.9271

SVM with 148 genes

0.9434

0.9032

1.0000

0.9492

0.9271

SVM with 449 genes

0.9434

0.9032

1.0000

0.9492

0.9507

KNN with 8 genes

0.8679

0.8889

0.8571

0.8727

0.8143

KNN with 11 genes

0.9245

0.9000

0.9643

0.9310

0.8800

KNN with 41 genes

0.8868

0.8929

0.8929

0.8929

0.8643

KNN with 51 genes

0.9434

0.9032

1.0000

0.9492

0.8800

KNN with 148 genes

0.9434

0.9032

1.0000

0.9492

0.8800

KNN with 449 genes

0.9245

0.9000

0.9643

0.9310

0.8643

  1. Note: This table illustrates the performance of various models on a test set. Versions of the TransGeneSelector that substituted the WGAN component with a mix-up component or replaced the Transformer component with an MLP are included, with the best-performing model in each case selected for inclusion. The Random Forest model was trained using feature engineering on gene sets of 8, 11, 41, 51, 128, and 449 genes, chosen because these sets achieved the highest cross-validation accuracy. The SVM and KNN model utilized genes selected by the Random Forest model. ‘NR-LR-MCP’ represents the optimal performance of the Network-Regularized Logistic Regression model with Minimax Concave Penalty. ‘AUC’ stands for Area Under the Curve, which measures the probability that a randomly chosen positive instance is ranked higher than a randomly selected negative instance. The highest AUC values are highlighted in bold