We introduce a two-step procedure, in the context of ultra-high dimensional additive models, which aims to reduce the size of covariates vector and distinguish linear and nonlinear effects among nonzero components. Our proposed screening procedure, in the first step, is constructed based on the concept of cumulative distribution function and conditional expectation of response in the framework of marginal correlation. B-splines and empirical distribution functions are used to estimate the two above measures. The sure screening property of this procedure is also established. In the second step, a double penalization based procedure is applied to identify nonzero and linear components, simultaneously. The performance of the designed method is examined by several test functions to show its capabilities against competitor methods when the distribution of errors is varied. Simulation studies imply that the proposed screening procedure can be applied to the ultra-high dimensional data and well detect the influential covariates. It also demonstrate the superiority in comparison with the existing methods. This method is also applied to identify most influential genes for overexpression of a G protein-coupled receptor in mice.  相似文献   

Feature screening and variable selection are fundamental in analysis of ultrahigh-dimensional data, which are being collected in diverse scientific fields at relatively low cost. Distance correlation-based sure independence screening (DC-SIS) has been proposed to perform feature screening for ultrahigh-dimensional data. The DC-SIS possesses sure screening property and filters out unimportant predictors in a model-free manner. Like all independence screening methods, however, it fails to detect the truly important predictors which are marginally independent of the response variable due to correlations among predictors. When there are many irrelevant predictors which are highly correlated with some strongly active predictors, the independence screening may miss other active predictors with relatively weak marginal signals. To improve the performance of DC-SIS, we introduce an effective iterative procedure based on distance correlation to detect all truly important predictors and potentially interactions in both linear and nonlinear models. Thus, the proposed iterative method possesses the favourable model-free and robust properties. We further illustrate its excellent finite-sample performance through comprehensive simulation studies and an empirical analysis of the rat eye expression data set.  相似文献   

In recent years, numerous feature screening schemes have been developed for ultra-high dimensional standard survival data with only one failure event. Nevertheless, existing literature pays little attention to related investigations for competing risks data, in which subjects suffer from multiple mutually exclusive failures. In this article, we develop a new marginal feature screening for ultra-high dimensional time-to-event data to allow for competing risks. The proposed procedure is model-free, and robust against heavy-tailed distributions and potential outliers for time to the type of failure of interest. Apart from this, it is invariant to any monotone transformation of event time of interest. Under rather mild assumptions, it is shown that the newly suggested approach possesses the ranking consistency and sure independence screening properties. Some numerical studies are conducted to evaluate the finite-sample performance of our method and make a comparison with its competitor, while an application to a real data set is provided to serve as an illustration.  相似文献   

Variable screening for censored survival data is most challenging when both survival and censoring times are correlated with an ultrahigh-dimensional vector of covariates. Existing approaches to handling censoring often make use of inverse probability weighting by assuming independent censoring with both survival time and covariates. This is a convenient but rather restrictive assumption which may be unmet in real applications, especially when the censoring mechanism is complex and the number of covariates is large. To accommodate heterogeneous (covariate-dependent) censoring that is often present in high-dimensional survival data, we propose a Gehan-type rank screening method to select features that are relevant to the survival time. The method is invariant to monotone transformations of the response and of the predictors, and works robustly for a general class of survival models. We establish the sure screening property of the proposed methodology. Simulation studies and a lymphoma data analysis demonstrate its favorable performance and practical utility.  相似文献   

This article is concerned with feature screening for the ultrahigh dimensional discriminant analysis. A variance ratio screening method is proposed and the sure screening property of this screening procedure is proved. The proposed method has some additional desirable features. First, it is model-free which does not require specific discriminant model and can be directly applied to the multi-categories situation. Second, it can effectively screen main effects and interaction effects simultaneously. Third, it is relatively inexpensive in computational cost because of the simple structure. The finite sample properties are performed through the Monte Carlo simulation studies and two real-data analyses.  相似文献   

Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis.  相似文献   

The first aim of this paper is to introduce a modular test for the three-way contingency table (TT). The second aim is to describe the procedure of generating TT using the bar method. The third aim is on the one hand to suggest the measure of untruthfulness of H0 and on the other hand to compare the quality of independence tests by using their power. Critical values for analyzed statistics were determined by simulating the Monte Carlo method.  相似文献   

In the context of regression rnodels with random effects, repeated response are traditionally assumed to be mutually independent conditional on the random effects. In order to asseess the validity of such an assumption and its impact on parameter inference, we propose an estimating equation methodology where both random eifects and within-subject correlation are modeled. This fllows a subsequent analysis on the statistical sianificance of the conditional correlation. We illustrate this method with the epilepsy data of Thall and Vail (1990), and find our method useh in a proper representation for khe random effect modeling.  相似文献   

High-dimensional data with a group structure of variables arise always in many contemporary statistical modelling problems. Heavy-tailed errors or outliers in the response often exist in these data. We consider robust group selection for partially linear models when the number of covariates can be larger than the sample size. The non-convex penalty function is applied to achieve both goals of variable selection and estimation in the linear part simultaneously, and we use polynomial splines to estimate the nonparametric component. Under regular conditions, we show that the robust estimator enjoys the oracle property. Simulation studies demonstrate the performance of the proposed method with samples of moderate size. The analysis of a real example illustrates that our method works well.  相似文献   

This paper is concerned with the conditional feature screening for ultra-high dimensional right censored data with some previously identified important predictors. A new model-free conditional feature screening approach, conditional correlation rank sure independence screening, has been proposed and investigated theoretically. The suggested conditional screening procedure has several desirable merits. First, it is model free, and thus robust to model misspecification. Second, it has the advantage of robustness of heavy-tailed distributions of the response and the presence of potential outliers in response. Third, it is naturally applicable to complete data when there is no censoring. Through simulation studies, we demonstrate that the proposed approach outperforms the CoxCS of Hong et al. under some circumstances. A real dataset is used to illustrate the usefulness of the proposed conditional screening method.  相似文献   

When θ is a multidimensional parameter, the issue of prior dependence or independence of coordinates is a serious concern. This is especially true in robust Bayesian analysis; Lavine et al. (J. Amer. Statist. Assoc.86, 964–971 (1991)) show that allowing a wide range of prior dependencies among coordinates can result in near vacuous conclusions. It is sometimes possible, however, to make confidently the judgement that the coordinates of θ are independent a priori and, when this can be done, robust Bayesian conclusions improve dramatically. In this paper, it is shown how to incorporate the independence assumption into robust Bayesian analysis involving -contamination and density band classes of priors. Attention is restricted to the case θ = (θ1, θ2) for clarity, although the ideas generalize.  相似文献   

We consider the problem of the sequential choice of design points in an approximately linear model. It is assumed that the fitted linear model is only approximately correct, in that the true response function contains a nonrandom, unknown term orthogonal to the fitted response. We also assume that the parameters are estimated by M-estimation. The goal is to choose the next design point in such a way as to minimize the resulting integrated squared bias of the estimated response, to order n-1. Explicit applications to analysis of variance and regression are given. In a simulation study the sequential designs compare favourably with some fixed-sample-size designs which are optimal for the true response to which the sequential designs must adapt.  相似文献   

Ultra-high dimensional data arise in many fields of modern science, such as medical science, economics, genomics and imaging processing, and pose unprecedented challenge for statistical analysis. With such rapid-growth size of scientific data in various disciplines, feature screening becomes a primary step to reduce the high dimensionality to a moderate scale that can be handled by the existing penalized methods. In this paper, we introduce a simple and robust feature screening method without any model assumption to tackle high dimensional censored data. The proposed method is model-free and hence applicable to a general class of survival models. The sure screening and ranking consistency properties without any finite moment condition of the predictors and the response are established. The computation of the proposed method is rather straightforward. Finite sample performance of the newly proposed method is examined via extensive simulation studies. An application is illustrated with the gene association study of the mantle cell lymphoma.  相似文献   

In this paper, we consider the application of the empirical likelihood for

linear models under median constraints in view of robustness. For two simple median constraints, it is shown that conditions to ensure the consistency of the empirical likelihood confidence regions can be surprisingly relaxed compared with the normal approach under L norm. However, the coverage accuracy of the empirical likelihood confidence regions based on simple median constrains cannot be improved because of the discontinuity of the constraints. Therefore, a smoothed version of median constraint is proposed and a general theory is established to ensure its validity.  相似文献   

The author develops a robust quasi‐likelihood method, which appears to be useful for down‐weighting any influential data points when estimating the model parameters. He illustrates the computational issues of the method in an example. He uses simulations to study the behaviour of the robust estimates when data are contaminated with outliers, and he compares these estimates to those obtained by the ordinary quasi‐likelihood method.  相似文献   

In this paper, we study the robust estimation for the order of hidden Markov model (HMM) based on a penalized minimum density power divergence estimator, which is obtained by utilizing the finite mixture marginal distribution of HMM. For this task, we adopt the locally conic parametrization method used in [D. Dacunha-Castelle and E. Gassiate, Testing in locally conic models and application to mixture models. ESAIM Probab. Stat. (1997), pp. 285–317; D. Dacunha-Castelle and E. Gassiate, Testing the order of a model using locally conic parametrization: population mixtures and stationary arma processes, Ann. Statist. 27 (1999), pp. 1178–1209; T. Lee and S. Lee, Robust and consistent estimation of the order of finite mixture models based on the minimizing a density power divergence estimator, Metrika 68 (2008), pp. 365–390] to avoid the difficulties that arise in handling mixture marginal models, such as the non-identifiability of the parameter space and the singularity problem with the asymptotic variance. We verify that the estimated order is consistent and simulation results are provided for illustration.  相似文献   

A number of score statistics are derived for a heterogeneous spatial Poisson process which has a composite intensity. The intensity consists of a 'background' process which is estimated

from a control point process by kernel density estimation. The parametric form of the composite intensity yields score tests for particular spatial effects. A numerical example concerning respiratory cancer mortality is given.  相似文献   

In this study, we consider a robust estimation for zero-inflated Poisson autoregressive models using the minimum density power divergence estimator designed by Basu et al. [Robust and efficient estimation by minimising a density power divergence. Biometrika. 1998;85:549–559]. We show that under some regularity conditions, the proposed estimator is strongly consistent and asymptotically normal. The performance of the estimator is evaluated through Monte Carlo simulations. A real data analysis using New South Wales crime data is also provided for illustration.  相似文献   

In this paper, we study the asymptotic properties of the adaptive Lasso estimators in high-dimensional generalized linear models. The consistency of the adaptive Lasso estimator is obtained. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with non zero coefficients with probability converging to one, and that the estimators of non zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an Oracle property. The results are examined by some simulations and a real example.  相似文献   

By approximating the nonparametric component using a regression spline in generalized partial linear models (GPLM), robust generalized estimating equations (GEE), involving bounded score function and leverage-based weighting function, can be used to estimate the regression parameters in GPLM robustly for longitudinal data or clustered data. In this paper, score test statistics are proposed for testing the regression parameters with robustness, and their asymptotic distributions under the null hypothesis and a class of local alternative hypotheses are studied. The proposed score tests reply on the estimation of a smaller model without the testing parameters involved, and perform well in the simulation studies and real data analysis conducted in this paper.  相似文献   

