Variance estimation based on blocked 3×2 cross-validation in high-dimensional linear regression |
| |
Authors: | Xingli Yang Yu Wang Wennan Yan Jihong Li |
| |
Affiliation: | aSchool of Mathematical Sciences, Shanxi University, Taiyuan, People''s Republic of China;bSchool of Modern Educational Technology, Shanxi University, Taiyuan, People''s Republic of China |
| |
Abstract: | In high-dimensional linear regression, the dimension of variables is always greater than the sample size. In this situation, the traditional variance estimation technique based on ordinary least squares constantly exhibits a high bias even under sparsity assumption. One of the major reasons is the high spurious correlation between unobserved realized noise and several predictors. To alleviate this problem, a refitted cross-validation (RCV) method has been proposed in the literature. However, for a complicated model, the RCV exhibits a lower probability that the selected model includes the true model in case of finite samples. This phenomenon may easily result in a large bias of variance estimation. Thus, a model selection method based on the ranks of the frequency of occurrences in six votes from a blocked 3×2 cross-validation is proposed in this study. The proposed method has a considerably larger probability of including the true model in practice than the RCV method. The variance estimation obtained using the model selected by the proposed method also shows a lower bias and a smaller variance. Furthermore, theoretical analysis proves the asymptotic normality property of the proposed variance estimation. |
| |
Keywords: | High-dimensional linear regression, blocked 3× 2 cross-validation, variance estimation, asymptotic normality property |
|
|