首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Multiple outliers detection in sparse high-dimensional regression
Authors:Tao Wang  Qun Li  Bin Chen
Institution:1. Institute of Statistics and LPMC, Nankai University, Tianjin, People's Republic of China;2. School of Mathematical Sciences, Huaiyin Normal University, Huaian, People's Republic of China;3. School of Mathematical Sciences, Nankai University, Tianjin, People's Republic of China;4. School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, People's Republic of China
Abstract:The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.
Keywords:High-dimensional linear regression  least trimmed square  multiple hypothesis testing  multiple outliers detection
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号