A simulation based method for assessing the statistical significance of logistic regression models after common variable selection procedures |
| |
Authors: | Tristan R Grogan David A Elashoff |
| |
Institution: | Department of Medicine Statistics Core, University of California, Los Angeles, California, USA |
| |
Abstract: | Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome. |
| |
Keywords: | AUC Logistic regression Simulation study Validation methods Variable selection |
|
|