首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Stability of feature selection in classification issues for high-dimensional correlated data
Authors:Émeline Perthame  Chloé Friguet  David Causeur
Institution:1.Institut de Recherche Mathématique de Rennes (IRMAR), UMR 6625 du Centre National de la Recherche Scientifique (CNRS),Agrocampus Ouest,Rennes,France;2.Laboratoire de Mathématiques de Bretagne Atlantique (LMBA), UMR 6205 du Centre National de la Recherche Scientifique (CNRS),University of South Brittany,Vannes,France
Abstract:Handling dependence or not in feature selection is still an open question in supervised classification issues where the number of covariates exceeds the number of observations. Some recent papers surprisingly show the superiority of naive Bayes approaches based on an obviously erroneous assumption of independence, whereas others recommend to infer on the dependence structure in order to decorrelate the selection statistics. In the classical linear discriminant analysis (LDA) framework, the present paper first highlights the impact of dependence in terms of instability of feature selection. A second objective is to revisit the above issue using a flexible factor modeling for the covariance. This framework introduces latent components of dependence, conditionally on which a new Bayes consistency is defined. A procedure is then proposed for the joint estimation of the expectation and variance parameters of the model. The present method is compared to recent regularized diagonal discriminant analysis approaches, assuming independence among features, and regularized LDA procedures, both in terms of classification performance and stability of feature selection. The proposed method is implemented in the R package FADA, freely available from the R repository CRAN.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号