Multiple Hypothesis Testing for Variable Selection |
| |
Authors: | Florian Rohart |
| |
Institution: | 1. UMR 5219, Institut de Mathématiques de Toulouse, INSA de Toulouse, Toulouse cedex 4, France;2. UMR 444, Laboratoire de Génétique Cellulaire, INRA Toulouse, Castanet Tolosan cedex, France;3. The University of Queensland Diamantina Institute, Translational Research Institute, the University of Queensland, Australia |
| |
Abstract: | We propose two new procedures based on multiple hypothesis testing for correct support estimation in high‐dimensional sparse linear models. We conclusively prove that both procedures are powerful and do not require the sample size to be large. The first procedure tackles the atypical setting of ordered variable selection through an extension of a testing procedure previously developed in the context of a linear hypothesis. The second procedure is the main contribution of this paper. It enables data analysts to perform support estimation in the general high‐dimensional framework of non‐ordered variable selection. A thorough simulation study and applications to real datasets using the R package mht shows that our non‐ordered variable procedure produces excellent results in terms of correct support estimation as well as in terms of mean square errors and false discovery rate, when compared to common methods such as the Lasso, the SCAD penalty, forward regression or the false discovery rate procedure (FDR). |
| |
Keywords: | high‐dimension linear model support estimation |
|
|