Improving genetic variant identification for quantitative traits using ensemble learning-based approaches

BMC Genomics

Table 1 Performance evaluation of various combinations of feature selection and association methods for all five datasets

Association methods \(\varvec{\rightarrow }\)	Linear Regression (LR)		Random Forest (RF)		Support Vector Regression (SVR)		XGBoost
Feature selection \(\varvec{\downarrow }\)	\(\varvec{R}^{\varvec{2}}\)	MAE	\(\varvec{R}^{\varvec{2}}\)	MAE	\(\varvec{R}^{\varvec{2}}\)	MAE	\(\varvec{R}^{\varvec{2}}\)	MAE
Simulated dataset
LASSO	0.78	12.46	0.39	20.64	0.79	12.16	0.34	21.47
Ridge	0.84	10.57	0.10	24.13	0.77	12.86	0.04	25.53
Elastic net	0.90	7.36	0.43	20.18	0.91	10.15	0.35	21.23
Mutual Information	−0.53	32.10	0.02	25.11	−0.51	31.80	−0.05	26.31
PennCATH dataset
LASSO	0.63	16.056	0.02	26.329	0.65	15.416	−0.04	26.94
Ridge	0.79	11.576	0.04	25.972	0.79	11.597	−0.37	28.73
Elastic net	0.86	9.961	0.04	26.018	0.89	8.666	−0.14	27.82
Mutual Information	−0.36	31.257	0.01	26.451	−0.45	32.073	−0.20	28.32
Imputed PennCATH-dataset with IMPUTE5
LASSO	0.68	14.69	0.04	25.91	0.70	14.31	−0.03	26.65
Ridge	0.85	10.27	0.06	25.31	0.84	10.64	−0.04	27.09
Elastic net	0.92	7.29	0.10	26.02	0.94	6.00	0.04	26.06
Mutual Information	−0.35	30.88	−0.01	26.74	−0.45	31.41	−0.08	27.32
Imputed PennCATH-dataset with Beagle5.4
LASSO	0.71	14.07	0.05	25.98	0.73	13.68	−0.06	26.44
Ridge	0.77	9.82	0.07	25.81	0.84	10.33	−5.58	27.26
Elastic net	0.93	6.86	0.07	25.80	0.94	5.88	0.09	24.98
Mutual Information	−0.43	32.26	−0.01	27.03	−0.50	32.92	−0.10	28.17

ISSN: 1471-2164