Using a Truncated C p Statistic for Variable Selection in Multiple Linear Regression |
| |
Authors: | D W Uys S J Steel |
| |
Institution: | 1. Department of Statistics and Actuarial Science , Stellenbosch University , Matieland, South Africa dwu@sun.ac.za;3. Department of Statistics and Actuarial Science , Stellenbosch University , Matieland, South Africa |
| |
Abstract: | In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p . |
| |
Keywords: | Coordinate-free formulation C p statistic Multiple linear regression Variable selection |
|
|