Selection bias in working with the top genes in supervised classification of tissue samples |
| |
Authors: | X Zhu C Ambroise GJ McLachlan |
| |
Abstract: | Currently there is much interest in using microarray gene-expression data to form prediction rules for the diagnosis of patient outcomes. A process of gene selection is usually carried out first to find those genes that are most useful according to some criterion for distinguishing between the given classes of tissue samples. However, there is a bias (selection bias) introduced in the estimate of the final version of a prediction rule that has been formed from a smaller subset of the genes that have been selected according to some optimality criterion. In this paper, we focus on the bias that arises when a full data set is not available in the first instance and the prediction rule is formed subsequently by working with the top-ranked genes from the full set. We demonstrate how large the subset of top genes must be before this selection bias is not of practical consequence. |
| |
Keywords: | Gene selection Support vector machine Error rates Cross-validation Selection bias |
本文献已被 ScienceDirect 等数据库收录! |