共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose a new partial correlation, the so-called composite quantile partial correlation, to measure the relationship of two variables given other variables. We further use this correlation to screen variables in ultrahigh-dimensional varying coefficient models. Our proposed method is fast and robust against outliers and can be efficiently employed in both single index variable and multiple index variable varying coefficient models. Numerical results indicate the preference of our proposed method. 相似文献
2.
Lyu Ni 《Journal of nonparametric statistics》2016,28(3):515-530
Most feature screening methods for ultrahigh-dimensional classification explicitly or implicitly assume the covariates are continuous. However, in the practice, it is quite common that both categorical and continuous covariates appear in the data, and applicable feature screening method is very limited. To handle this non-trivial situation, we propose an entropy-based feature screening method, which is model free and provides a unified screening procedure for both categorical and continuous covariates. We establish the sure screening and ranking consistency properties of the proposed procedure. We investigate the finite sample performance of the proposed procedure by simulation studies and illustrate the method by a real data analysis. 相似文献
3.
For ultrahigh-dimensional data, independent feature screening has been demonstrated both theoretically and empirically to be an effective dimension reduction method with low computational demanding. Motivated by the Buckley–James method to accommodate censoring, we propose a fused Kolmogorov–Smirnov filter to screen out the irrelevant dependent variables for ultrahigh-dimensional survival data. The proposed model-free screening method can work with many types of covariates (e.g. continuous, discrete and categorical variables) and is shown to enjoy the sure independent screening property under mild regularity conditions without requiring any moment conditions on covariates. In particular, the proposed procedure can still be powerful when covariates are strongly dependent on each other. We further develop an iterative algorithm to enhance the performance of our method while dealing with the practical situations where some covariates may be marginally unrelated but jointly related to the response. We conduct extensive simulations to evaluate the finite-sample performance of the proposed method, showing that it has favourable exhibition over the existing typical methods. As an illustration, we apply the proposed method to the diffuse large-B-cell lymphoma study. 相似文献
4.
This paper focuses on the variable selections for a varying coefficient models with missing response at random. A procedure is presented by basis function approximations with smooth-threshold estimating equations. Furthermore, the proposed method selects significant variables and estimates coefficients simultaneously avoiding the problem of solving a convex optimization, which reduced the burden of computation. Compared to existing equation based approaches, our procedure is more efficient and quick. With proper choices the regularization parameter, the resulting estimates perform an oracle property. A cross-validation for tuning parameter selection is also proposed, a numerical study confirms the performance of the proposed method. 相似文献
5.
In many conventional scientific investigations with high or ultra-high dimensional feature spaces, the relevant features, though sparse, are large in number compared with classical statistical problems, and the magnitude of their effects tapers off. It is reasonable to model the number of relevant features as a diverging sequence when sample size increases. In this paper, we investigate the properties of the extended Bayes information criterion (EBIC) (Chen and Chen, 2008) for feature selection in linear regression models with diverging number of relevant features in high or ultra-high dimensional feature spaces. The selection consistency of the EBIC in this situation is established. The application of EBIC to feature selection is considered in a SCAD cum EBIC procedure. Simulation studies are conducted to demonstrate the performance of the SCAD cum EBIC procedure in finite sample cases. 相似文献
6.
We derive explicit formulas for Sobol's sensitivity indices (SSIs) under the generalized linear models (GLMs) with independent or multivariate normal inputs. We argue that the main-effect SSIs provide a powerful tool for variable selection under GLMs with identity links under polynomial regressions. We also show via examples that the SSI-based variable selection results are similar to the ones obtained by the random forest algorithm but without the computational burden of data permutation. Finally, applying our results to the problem of gene network discovery, we identify through the SSI analysis of a public microarray dataset several novel higher-order gene–gene interactions missed out by the more standard inference methods. The relevant functions for SSI analysis derived here under GLMs with identity, log, and logit links are implemented and made available in the R package Sobol sensitivity. 相似文献
7.
An intrinsic association matrix is introduced to measure category-to-variable association based on proportional reduction of prediction error by an explanatory variable. The normalization of the diagonal gives rise to the expected rates of error-reduction and the off-diagonal yields expected distributions of the rates of error for all response categories. A general framework of association measures based on the proposed matrix is established using an application-specific weight vector. A hierarchy of equivalence relations defined by the association matrix and vector is shown. Applications to financial and survey data together with simulation results are presented. 相似文献
8.
Xiaolin Chen 《Journal of Statistical Computation and Simulation》2018,88(12):2425-2446
This paper is concerned with the conditional feature screening for ultra-high dimensional right censored data with some previously identified important predictors. A new model-free conditional feature screening approach, conditional correlation rank sure independence screening, has been proposed and investigated theoretically. The suggested conditional screening procedure has several desirable merits. First, it is model free, and thus robust to model misspecification. Second, it has the advantage of robustness of heavy-tailed distributions of the response and the presence of potential outliers in response. Third, it is naturally applicable to complete data when there is no censoring. Through simulation studies, we demonstrate that the proposed approach outperforms the CoxCS of Hong et al. under some circumstances. A real dataset is used to illustrate the usefulness of the proposed conditional screening method. 相似文献
9.
This paper is concerned with the stable feature screening for the ultrahigh dimensional data. To deal with the ultrahigh dimensional data problem and screen the important features, a set-averaging measurement is proposed. The model averaging technique and the conditional quantile method are used to construct the weighted set-averaging feature screening procedure to identify the relationships between the possible predictors and the response variable. The proposed screening method is model free, stable and possesses the sure screening property under some regular conditions. Some Monte Carlo simulations and a real data application are conducted to evaluate the performance of the proposed procedure. 相似文献
10.
AbstractIn this article, we consider a panel data partially linear regression model with fixed effect and non parametric time trend function. The data can be dependent cross individuals through linear regressor and error components. Unlike the methods using non parametric smoothing technique, a difference-based method is proposed to estimate linear regression coefficients of the model to avoid bandwidth selection. Here the difference technique is employed to eliminate the non parametric function effect, not the fixed effects, on linear regressor coefficient estimation totally. Therefore, a more efficient estimator for parametric part is anticipated, which is shown to be true by the simulation results. For the non parametric component, the polynomial spline technique is implemented. The asymptotic properties of estimators for parametric and non parametric parts are presented. We also show how to select informative ones from a number of covariates in the linear part by using smoothly clipped absolute deviation-penalized estimators on a difference-based least-squares objective function, and the resulting estimators perform asymptotically as well as the oracle procedure in terms of selecting the correct model. 相似文献
11.
Jiting Huang 《统计学通讯:模拟与计算》2019,48(6):1891-1900
We study the variable selection problem for a class of generalized linear models with endogenous covariates. Based on the instrumental variable adjustment technology and the smooth-threshold estimating equation (SEE) method, we propose an instrumental variable based variable selection procedure. The proposed variable selection method can attenuate the effect of endogeneity in covariates, and is easy for application in practice. Some theoretical results are also derived such as the consistency of the proposed variable selection procedure and the convergence rate of the resulting estimator. Further, some simulation studies and a real data analysis are conducted to evaluate the performance of the proposed method, and simulation results show that the proposed method is workable. 相似文献
12.
The problem of selecting the normal population with the largest population mean when the populations have a common known variance is considered. A two-stage procedure is proposed which guarantees the same probability requirement using the indifference-zone approach as does the single-stage procedure of Bechhofer (1954). The two-stage procedure has the highly desirable property that the expected total number of observations required by the procedure is always less than the total number of observations required by the corresponding single-stage procedure, regardless of the configuration of the population means. The saving in expected total number of observations can be substantial, particularly when the configuration of the population means is favorable to the experimenter. The saving is accomplished by screening out “non-contending” populations in the first stage, and concentrating sampling only on “contending” populations in the second stage. The two-stage procedure can be regarded as a composite one which uses a screening subset-type approach (Gupta (1956), (1965)) in the first stage, and an indifference-zone approach (Bechhofer (1954)) applied to all populations retained in the selected sub-set in the second stage. Constants to implement the procedure for various k and P? are provided, as are calculations giving the saving in expected total sample size if the two-stage procedure is used in place of the corresponding single-stage procedure. 相似文献
13.
The linear regression models with the autoregressive moving average (ARMA) errors (REGARMA models) are often considered, in order to reflect a serial correlation among observations. In this article, we focus on an adaptive least absolute shrinkage and selection operator (LASSO) (ALASSO) method for the variable selection of the REGARMA models and extend it to the linear regression models with the ARMA-generalized autoregressive conditional heteroskedasticity (ARMA-GARCH) errors (REGARMA-GARCH models). This attempt is an extension of the existing ALASSO method for the linear regression models with the AR errors (REGAR models) proposed by Wang et al. in 2007. New ALASSO algorithms are proposed to determine important predictors for the REGARMA and REGARMA-GARCH models. Finally, we provide the simulation results and real data analysis to illustrate our findings. 相似文献
14.
We consider a non-centered parameterization of the standard random-effects model, which is based on the Cholesky decomposition of the variance-covariance matrix. The regression type structure of the non-centered parameterization allows us to use Bayesian variable selection methods for covariance selection. We search for a parsimonious variance-covariance matrix by identifying the non-zero elements of the Cholesky factors. With this method we are able to learn from the data for each effect whether it is random or not, and whether covariances among random effects are zero. An application in marketing shows a substantial reduction of the number of free elements in the variance-covariance matrix. 相似文献
15.
ABSTRACTModel selection can be defined as the task of estimating the performance of different models in order to choose the most parsimonious one, among a potentially very large set of candidate statistical models. We propose a graphical representation to be considered as an extension to the class of mixed models of the deviance plot proposed in the literature within the framework of classical and generalized linear models. This graphical representation allows, once a reduced number of models have been selected, to identify important covariates focusing only on the fixed effects component, assuming the random part properly specified. Nevertheless, we suggest also a standalone figure representing the residual random variance ratio: a cross-evaluation of the two graphical representations will allow to derive some conclusions on the random part specification of the model and a more accurate selection of the final model. 相似文献
16.
Chun-Xia Zhang Jiang-She Zhang Guan-Wei Wang Nan-Nan Ji 《Journal of applied statistics》2018,45(10):1734-1755
At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods. 相似文献
17.
In this article, we consider the variable selection and estimation for high-dimensional generalized linear models when the number of parameters diverges with the sample size. We propose a penalized quasi-likelihood function with the bridge penalty. The consistency and the Oracle property of the quasi-likelihood bridge estimators are obtained. Some simulations and a real data analysis are given to illustrate the performance of the proposed method. 相似文献
18.
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step. 相似文献
19.
20.
The generalized additive model is a well established and strong tool that allows modelling smooth effects of predictors on the response. However, if the link function, which is typically chosen as the canonical link, is misspecified, estimates can be biased. A procedure is proposed that simultaneously estimates the form of the link function and the unknown form of the predictor functions including selection of predictors. The procedure is based on boosting methodology, which obtains estimates by using a sequence of weak learners. It strongly dominates fitting procedures that are unable to modify a given link function if the true link function deviates from the fixed function. The performance of the procedure is shown in simulation studies and illustrated by real world examples. 相似文献