Skip to main content

Table 1 Performance evaluation of various combinations of feature selection and association methods for all five datasets

From: Improving genetic variant identification for quantitative traits using ensemble learning-based approaches

Association methods \(\varvec{\rightarrow }\)

Linear Regression (LR)

Random Forest (RF)

Support Vector Regression (SVR)

XGBoost

Feature selection \(\varvec{\downarrow }\)

\(\varvec{R}^{\varvec{2}}\)

MAE

\(\varvec{R}^{\varvec{2}}\)

MAE

\(\varvec{R}^{\varvec{2}}\)

MAE

\(\varvec{R}^{\varvec{2}}\)

MAE

Simulated dataset

   LASSO

0.78

12.46

0.39

20.64

0.79

12.16

0.34

21.47

   Ridge

0.84

10.57

0.10

24.13

0.77

12.86

0.04

25.53

   Elastic net

0.90

7.36

0.43

20.18

0.91

10.15

0.35

21.23

   Mutual Information

−0.53

32.10

0.02

25.11

−0.51

31.80

−0.05

26.31

PennCATH dataset

   LASSO

0.63

16.056

0.02

26.329

0.65

15.416

−0.04

26.94

   Ridge

0.79

11.576

0.04

25.972

0.79

11.597

−0.37

28.73

   Elastic net

0.86

9.961

0.04

26.018

0.89

8.666

−0.14

27.82

   Mutual Information

−0.36

31.257

0.01

26.451

−0.45

32.073

−0.20

28.32

Imputed PennCATH-dataset with IMPUTE5

   LASSO

0.68

14.69

0.04

25.91

0.70

14.31

−0.03

26.65

   Ridge

0.85

10.27

0.06

25.31

0.84

10.64

−0.04

27.09

   Elastic net

0.92

7.29

0.10

26.02

0.94

6.00

0.04

26.06

   Mutual Information

−0.35

30.88

−0.01

26.74

−0.45

31.41

−0.08

27.32

Imputed PennCATH-dataset with Beagle5.4

   LASSO

0.71

14.07

0.05

25.98

0.73

13.68

−0.06

26.44

   Ridge

0.77

9.82

0.07

25.81

0.84

10.33

−5.58

27.26

   Elastic net

0.93

6.86

0.07

25.80

0.94

5.88

0.09

24.98

   Mutual Information

−0.43

32.26

−0.01

27.03

−0.50

32.92

−0.10

28.17

  1. Bold values denote the best performance value