期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Variable selection for longitudinal data with high-dimensional covariates and dropouts

Xueying Zheng Bo Fu Jiajia Zhang 《Journal of Statistical Computation and Simulation》2018,88(4):712-725

A new variable selection approach utilizing penalized estimating equations is developed for high-dimensional longitudinal data with dropouts under a missing at random (MAR) mechanism. The proposed method is based on the best linear approximation of efficient scores from the full dataset and does not need to specify a separate model for the missing or imputation process. The coordinate descent algorithm is adopted to implement the proposed method and is computational feasible and stable. The oracle property is established and extensive simulation studies show that the performance of the proposed variable selection method is much better than that of penalized estimating equations dealing with complete data which do not account for the MAR mechanism. In the end, the proposed method is applied to a Lifestyle Education for Activity and Nutrition study and the interaction effect between intervention and time is identified, which is consistent with previous findings. 相似文献

2.

A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection

Ayça Çakmak Pehlivanlı 《Journal of applied statistics》2016,43(6):1140-1154

Classification of high-dimensional data set is a big challenge for statistical learning and data mining algorithms. To effectively apply classification methods to high-dimensional data sets, feature selection is an indispensable pre-processing step of learning process. In this study, we consider the problem of constructing an effective feature selection and classification scheme for data set which has a small number of sample size with a large number of features. A novel feature selection approach, named four-Staged Feature Selection, has been proposed to overcome high-dimensional data classification problem by selecting informative features. The proposed method first selects candidate features with number of filtering methods which are based on different metrics, and then it applies semi-wrapper, union and voting stages, respectively, to obtain final feature subsets. Several statistical learning and data mining methods have been carried out to verify the efficiency of the selected features. In order to test the adequacy of the proposed method, 10 different microarray data sets are employed due to their high number of features and small sample size. 相似文献

3.

Estimating and testing conditional sums of means in high dimensional multivariate binary data

Junyong ParkJ. Wade Davis 《Journal of statistical planning and inference》2011,141(2):1021-1030

We study the estimation of the strength of signals corresponding to the high valued observations in multivariate binary data. These problems can arise in a variety of areas, such as mass spectrometry or function magnetic resonance imaging (fMRI), where the underlying signals could be interpreted as a proxy for biochemical or physiological response to a condition or treatment. More specifically, the problem we consider involves estimating the sum of a collection of binomial probabilities corresponding to large values of the associated binomial random variables. We emphasize the case where the dimension is much greater than the sample size, and most of the probabilities of the events of interest are close to zero. Two estimation approaches are proposed: conditional maximum likelihood and nonparametric empirical Bayes. We use these estimators to construct a test of homogeneity for two groups of high dimensional multivariate binary data. Simulation studies on the size and power of the proposed tests are given, and the tests are demonstrated using mass spectrometry data from a breast cancer study. 相似文献

4.

A model selection criterion for discriminant analysis of high-dimensional data with fewer observations

Masashi Hyodo Takayuki Yamada Muni S. Srivastava 《Journal of statistical planning and inference》2012

This paper is concerned with the problem of selecting variables in two-group discriminant analysis for high-dimensional data with fewer observations than the dimension. We consider a selection criterion based on approximately unbiased for AIC type of risk. When the dimension is large compared to the sample size, AIC type of risk cannot be defined. We propose AIC by replacing maximum likelihood estimator with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008). It has been further extended by Yamamura et al. (2010). Simulation revealed that the proposed AIC performs well. 相似文献

5.

Variable selection and inference procedures for marginal analysis of longitudinal data with missing observations and covariate measurement error

下载免费PDF全文

Grace Y. Yi Xianming Tan Runze Li 《Revue canadienne de statistique》2015,43(4):498-518

相似文献