首页 | 本学科首页   官方微博 | 高级检索  
     


Distance-based outlier detection for high dimension,low sample size data
Authors:Jeongyoun Ahn  Myung Hee Lee  Jung Ae Lee
Affiliation:1. Department of Statistics, University of Georgia, Athens, GA, USA;2. Center for Global Health, Department of Medicine, Weill Cornell Medical College, New York City, NY, USA;3. Agricultural Statistics Laboratory, University of Arkansas, Fayetteville, AR, USA
Abstract:Despite the popularity of high dimension, low sample size data analysis, there has not been enough attention to the sample integrity issue, in particular, a possibility of outliers in the data. A new outlier detection procedure for data with much larger dimensionality than the sample size is presented. The proposed method is motivated by asymptotic properties of high-dimensional distance measures. Empirical studies suggest that high-dimensional outlier detection is more likely to suffer from a swamping effect rather than a masking effect, thus yields more false positives than false negatives. We compare the proposed approaches with existing methods using simulated data from various population settings. A real data example is presented with a consideration on the implication of found outliers.
Keywords:Centroid distance  HDLSS  high-dimensional asymptotics  maximal data piling distance  multiple outliers
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号