首页 | 本学科首页   官方微博 | 高级检索  
     


A permutation test approach to the choice of size k for the nearest neighbors classifier
Authors:Yinglei Lai  Baolin Wu  Hongyu Zhao
Affiliation:1. Department of Statistics and Biostatistics Center , The George Washington University , 2140 Pennsylvania Avenue, N.W., Washington, DC, 20052, USA;2. Division of Biostatistics, School of Public Health , University of Minnesota , A442 Mayo Building, MMC 303, 420 Delaware St SE, Minneapolis, MN, 55455, USA;3. Department of Epidemiology and Public Health , Yale University School of Medicine , New Haven, CT, 06520, USA
Abstract:The k nearest neighbors (k-NN) classifier is one of the most popular methods for statistical pattern recognition and machine learning. In practice, the size k, the number of neighbors used for classification, is usually arbitrarily set to one or some other small numbers, or based on the cross-validation procedure. In this study, we propose a novel alternative approach to decide the size k. Based on a k-NN-based multivariate multi-sample test, we assign each k a permutation test based Z-score. The number of NN is set to the k with the highest Z-score. This approach is computationally efficient since we have derived the formulas for the mean and variance of the test statistic under permutation distribution for multiple sample groups. Several simulation and real-world data sets are analyzed to investigate the performance of our approach. The usefulness of our approach is demonstrated through the evaluation of prediction accuracies using Z-score as a criterion to select the size k. We also compare our approach to the widely used cross-validation approaches. The results show that the size k selected by our approach yields high prediction accuracies when informative features are used for classification, whereas the cross-validation approach may fail in some cases.
Keywords:nearest neighbors classifier  number of neighbors  permutation test  prediction accuracy  cross-validation
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号