共查询到13条相似文献,搜索用时 6 毫秒
1.
We introduce a two-step procedure, in the context of ultra-high dimensional additive models, which aims to reduce the size of covariates vector and distinguish linear and nonlinear effects among nonzero components. Our proposed screening procedure, in the first step, is constructed based on the concept of cumulative distribution function and conditional expectation of response in the framework of marginal correlation. B-splines and empirical distribution functions are used to estimate the two above measures. The sure screening property of this procedure is also established. In the second step, a double penalization based procedure is applied to identify nonzero and linear components, simultaneously. The performance of the designed method is examined by several test functions to show its capabilities against competitor methods when the distribution of errors is varied. Simulation studies imply that the proposed screening procedure can be applied to the ultra-high dimensional data and well detect the influential covariates. It also demonstrate the superiority in comparison with the existing methods. This method is also applied to identify most influential genes for overexpression of a G protein-coupled receptor in mice. 相似文献
2.
Abhik Ghosh Erica Ponzi Torkjel Sandanger Magne Thoresen 《Scandinavian Journal of Statistics》2023,50(3):1232-1262
We consider the problem of variable screening in ultra-high-dimensional generalized linear models (GLMs) of nonpolynomial orders. Since the popular SIS approach is extremely unstable in the presence of contamination and noise, we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs well under pure and contaminated data scenarios. We provide a theoretical motivation for the use of marginal MDPDEs for variable screening from both population as well as sample aspects; in particular, we prove that the marginal MDPDEs are uniformly consistent leading to the sure screening property of our proposed algorithm. Finally, we propose an appropriate MDPDE-based extension for robust conditional screening in GLMs along with the derivation of its sure screening property. Our proposed methods are illustrated through extensive numerical studies along with an interesting real data application. 相似文献
3.
Jing Pan Shucong Zhang Yong Zhou 《Journal of Statistical Computation and Simulation》2019,89(3):395-413
Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis. 相似文献
4.
In the era of Big Data, extracting the most important exploratory variables available in ultrahigh-dimensional data plays a key role in scientific researches. Existing researches have been mainly focusing on applying the extracted exploratory variables to describe the central tendency of their related response variables. For a response variable, its variability characteristic is as much important as the central tendency in statistical inference. This paper focuses on the variability and proposes a new model-free feature screening approach: sure explained variability and independence screening (SEVIS). The core of SEVIS is to take the advantage of recently proposed asymmetric and nonlinear generalised measures of correlation in the screening. Under some mild conditions, the paper shows that SEVIS not only possesses desired sure screening property and ranking consistency property, but also is a computational convenient variable selection method to deal with ultrahigh-dimensional data sets with more features than observations. The superior performance of SEVIS, compared with existing model-free methods, is illustrated in extensive simulations. A real example in ultrahigh-dimensional variable selection demonstrates that the variables selected by SEVIS better explain not only the response variables, but also the variables selected by other methods. 相似文献
5.
Weiyan Mu 《Journal of applied statistics》2014,41(10):2092-2102
Sure independence screening (SIS) proposed by Fan and Lv [4] uses marginal correlations to select important variables, and has proven to be an efficient method for ultrahigh-dimensional linear models. This paper provides two robust versions of SIS against outliers. The two methods, respectively, replace the sample correlation in SIS with two robust measures, and screen variables by ranking them. Like SIS, the proposed methods are simple and fast. In addition, they are highly robust against a substantial fraction of outliers in the data. These features make them applicable to large datasets which may contain outliers. Simulation results are presented to show their effectiveness. 相似文献
6.
Variable screening for censored survival data is most challenging when both survival and censoring times are correlated with an ultrahigh-dimensional vector of covariates. Existing approaches to handling censoring often make use of inverse probability weighting by assuming independent censoring with both survival time and covariates. This is a convenient but rather restrictive assumption which may be unmet in real applications, especially when the censoring mechanism is complex and the number of covariates is large. To accommodate heterogeneous (covariate-dependent) censoring that is often present in high-dimensional survival data, we propose a Gehan-type rank screening method to select features that are relevant to the survival time. The method is invariant to monotone transformations of the response and of the predictors, and works robustly for a general class of survival models. We establish the sure screening property of the proposed methodology. Simulation studies and a lymphoma data analysis demonstrate its favorable performance and practical utility. 相似文献
7.
In practice, the presence of influential observations may lead to misleading results in variable screening problems. We, therefore, propose a robust variable screening procedure for high-dimensional data analysis in this paper. Our method consists of two steps. The first step is to define a new high-dimensional influence measure and propose a novel influence diagnostic procedure to remove those unusual observations. The second step is to utilize the sure independence screening procedure based on distance correlation to select important variables in high-dimensional regression analysis. The new influence measure and diagnostic procedure that we developed are model free. To confirm the effectiveness of the proposed method, we conduct simulation studies and a real-life data analysis to illustrate the merits of the proposed approach over some competing methods. Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods. 相似文献
8.
The varying-coefficient model is an important nonparametric statistical model since it allows appreciable flexibility on the structure of fitted model. For ultra-high dimensional heterogeneous data it is very necessary to examine how the effects of covariates vary with exposure variables at different quantile level of interest. In this paper, we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable. Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework. This ensures the sure screening property in quantile-adaptive varying-coefficient model. Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data. 相似文献
9.
Case‐cohort design has been demonstrated to be an economical and efficient approach in large cohort studies when the measurement of some covariates on all individuals is expensive. Various methods have been proposed for case‐cohort data when the dimension of covariates is smaller than sample size. However, limited work has been done for high‐dimensional case‐cohort data which are frequently collected in large epidemiological studies. In this paper, we propose a variable screening method for ultrahigh‐dimensional case‐cohort data under the framework of proportional model, which allows the covariate dimension increases with sample size at exponential rate. Our procedure enjoys the sure screening property and the ranking consistency under some mild regularity conditions. We further extend this method to an iterative version to handle the scenarios where some covariates are jointly important but are marginally unrelated or weakly correlated to the response. The finite sample performance of the proposed procedure is evaluated via both simulation studies and an application to a real data from the breast cancer study. 相似文献
10.
11.
Graphical models capture the conditional independence structure among random variables via existence of edges among vertices. One way of inferring a graph is to identify zero partial correlation coefficients, which is an effective way of finding conditional independence under a multivariate Gaussian setting. For more general settings, we propose kernel partial correlation which extends partial correlation with a combination of two kernel methods. First, a nonparametric function estimation is employed to remove effects from other variables, and then the dependence between remaining random components is assessed through a nonparametric association measure. The proposed approach is not only flexible but also robust under high levels of noise owing to the robustness of the nonparametric approaches. 相似文献
12.
A procedure for selecting a subset of predictor variables in regression analysis is suggested. The procedure is so designed that it leads to the selection of a subset of variables having an adequate degree of informativeness with a directly specified confidence coefficient. Some examples are considered to illustrate the application of the procedure. 相似文献
13.
Connie Stewart 《Journal of applied statistics》2017,44(7):1137-1152
For many applications involving compositional data, it is necessary to establish a valid measure of distance, yet when essential zeros are present traditional distance measures are problematic. In quantitative fatty acid signature analysis (QFASA), compositional diet estimates are produced that often contain many zeros. In order to test for a difference in diet between two populations of predators using the QFASA diet estimates, a legitimate measure of distance for use in the test statistic is necessary. Since ecologists using QFASA must first select the potential species of prey in the predator's diet, the chosen measure of distance should be such that the distance between samples does not decrease as the number of species considered increases, a property known in general as subcompositional coherence. In this paper we compare three measures of distance for compositional data capable of handling zeros, but not satisfying some of the well-accepted principles of compositional data analysis. For compositional diet estimates, the most relevant of these is the property of subcompositionally coherence and we show that this property may be approximately satisfied. Based on the results of a simulation study and an application to real-life QFASA diet estimates of grey seals, we recommend the chi-square measure of distance. 相似文献