首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到5条相似文献,搜索用时 0 毫秒
1.
A new variable selection approach utilizing penalized estimating equations is developed for high-dimensional longitudinal data with dropouts under a missing at random (MAR) mechanism. The proposed method is based on the best linear approximation of efficient scores from the full dataset and does not need to specify a separate model for the missing or imputation process. The coordinate descent algorithm is adopted to implement the proposed method and is computational feasible and stable. The oracle property is established and extensive simulation studies show that the performance of the proposed variable selection method is much better than that of penalized estimating equations dealing with complete data which do not account for the MAR mechanism. In the end, the proposed method is applied to a Lifestyle Education for Activity and Nutrition study and the interaction effect between intervention and time is identified, which is consistent with previous findings.  相似文献   

2.
Classification of high-dimensional data set is a big challenge for statistical learning and data mining algorithms. To effectively apply classification methods to high-dimensional data sets, feature selection is an indispensable pre-processing step of learning process. In this study, we consider the problem of constructing an effective feature selection and classification scheme for data set which has a small number of sample size with a large number of features. A novel feature selection approach, named four-Staged Feature Selection, has been proposed to overcome high-dimensional data classification problem by selecting informative features. The proposed method first selects candidate features with number of filtering methods which are based on different metrics, and then it applies semi-wrapper, union and voting stages, respectively, to obtain final feature subsets. Several statistical learning and data mining methods have been carried out to verify the efficiency of the selected features. In order to test the adequacy of the proposed method, 10 different microarray data sets are employed due to their high number of features and small sample size.  相似文献   

3.
We study the estimation of the strength of signals corresponding to the high valued observations in multivariate binary data. These problems can arise in a variety of areas, such as mass spectrometry or function magnetic resonance imaging (fMRI), where the underlying signals could be interpreted as a proxy for biochemical or physiological response to a condition or treatment. More specifically, the problem we consider involves estimating the sum of a collection of binomial probabilities corresponding to large values of the associated binomial random variables. We emphasize the case where the dimension is much greater than the sample size, and most of the probabilities of the events of interest are close to zero. Two estimation approaches are proposed: conditional maximum likelihood and nonparametric empirical Bayes. We use these estimators to construct a test of homogeneity for two groups of high dimensional multivariate binary data. Simulation studies on the size and power of the proposed tests are given, and the tests are demonstrated using mass spectrometry data from a breast cancer study.  相似文献   

4.
This paper is concerned with the problem of selecting variables in two-group discriminant analysis for high-dimensional data with fewer observations than the dimension. We consider a selection criterion based on approximately unbiased for AIC type of risk. When the dimension is large compared to the sample size, AIC type of risk cannot be defined. We propose AIC by replacing maximum likelihood estimator with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008). It has been further extended by Yamamura et al. (2010). Simulation revealed that the proposed AIC performs well.  相似文献   

5.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号