Selection of Binary Variables and Classification by Boosting |
| |
Authors: | Junyong Park Jayson D. Wilbur Jayanta K. Ghosh Cindy H. Nakatsu Corinne Ackerman |
| |
Affiliation: | 1. Department of Mathematics and Statistics , Univeristy of Maryland Baltimore County , Baltimore, Maryland, USA junpark@math.umbc.edu;3. Department of Mathematical Sciences , Worcester Polytechnic Institute , Worcester, Massachusetts, USA;4. Department of Statistics , Purdue University , West Lafayette, Indiana, USA;5. Department of Agronomy , Purdue University , West Lafayette, Indiana, USA |
| |
Abstract: | We adopt boosting for classification and selection of high-dimensional binary variables for which classical methods based on normality and non singular sample dispersion are inapplicable. Boosting seems particularly well suited for binary variables. We present three methods of which two combine boosting with the relatively classical variable selection methods developed in Wilbur et al. (2002 Wilbur , J. D. , Ghosh , J. K. , Nakatsu , C. H. , Brouder , S. M. , Doerge , R. W. ( 2002 ). Variable selection in high-dimensional multivariate binary data with application to the analysis of microbial community DNA fingerprints . Biometrics 58 : 378 – 386 . [Google Scholar]). Our primary interest is variable selection in classification with small misclassification error being used as validation of proposed method for variable selection. Two of the new methods perform uniformly better than Wilbur et al. (2002 Wilbur , J. D. , Ghosh , J. K. , Nakatsu , C. H. , Brouder , S. M. , Doerge , R. W. ( 2002 ). Variable selection in high-dimensional multivariate binary data with application to the analysis of microbial community DNA fingerprints . Biometrics 58 : 378 – 386 . [Google Scholar]) in one set of simulated and three real life examples. |
| |
Keywords: | Boosting Cross validation DNA fingerprints High-dimensional data Multivariate binary data Thresholding Variable selection |
|
|