首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In recent years, numerous feature screening schemes have been developed for ultra-high dimensional standard survival data with only one failure event. Nevertheless, existing literature pays little attention to related investigations for competing risks data, in which subjects suffer from multiple mutually exclusive failures. In this article, we develop a new marginal feature screening for ultra-high dimensional time-to-event data to allow for competing risks. The proposed procedure is model-free, and robust against heavy-tailed distributions and potential outliers for time to the type of failure of interest. Apart from this, it is invariant to any monotone transformation of event time of interest. Under rather mild assumptions, it is shown that the newly suggested approach possesses the ranking consistency and sure independence screening properties. Some numerical studies are conducted to evaluate the finite-sample performance of our method and make a comparison with its competitor, while an application to a real data set is provided to serve as an illustration.  相似文献   

2.
In this paper, we propose a conditional quantile independence screening approach for ultra-high-dimensional heterogeneous data given some known, significant and low-dimensional variables. The new method does not require imposing a specific model structure for the response and covariates and can detect additional features that contribute to conditional quantiles of the response given those already-identified important predictors. We also prove that the proposed procedure enjoys the ranking consistency and sure screening properties. Some simulation studies are carried out to examine the performance of advised procedure. At last, we illustrate it by a real data example.  相似文献   

3.
Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis.  相似文献   

4.
We introduce a two-step procedure, in the context of ultra-high dimensional additive models, which aims to reduce the size of covariates vector and distinguish linear and nonlinear effects among nonzero components. Our proposed screening procedure, in the first step, is constructed based on the concept of cumulative distribution function and conditional expectation of response in the framework of marginal correlation. B-splines and empirical distribution functions are used to estimate the two above measures. The sure screening property of this procedure is also established. In the second step, a double penalization based procedure is applied to identify nonzero and linear components, simultaneously. The performance of the designed method is examined by several test functions to show its capabilities against competitor methods when the distribution of errors is varied. Simulation studies imply that the proposed screening procedure can be applied to the ultra-high dimensional data and well detect the influential covariates. It also demonstrate the superiority in comparison with the existing methods. This method is also applied to identify most influential genes for overexpression of a G protein-coupled receptor in mice.  相似文献   

5.
For ultrahigh-dimensional data, independent feature screening has been demonstrated both theoretically and empirically to be an effective dimension reduction method with low computational demanding. Motivated by the Buckley–James method to accommodate censoring, we propose a fused Kolmogorov–Smirnov filter to screen out the irrelevant dependent variables for ultrahigh-dimensional survival data. The proposed model-free screening method can work with many types of covariates (e.g. continuous, discrete and categorical variables) and is shown to enjoy the sure independent screening property under mild regularity conditions without requiring any moment conditions on covariates. In particular, the proposed procedure can still be powerful when covariates are strongly dependent on each other. We further develop an iterative algorithm to enhance the performance of our method while dealing with the practical situations where some covariates may be marginally unrelated but jointly related to the response. We conduct extensive simulations to evaluate the finite-sample performance of the proposed method, showing that it has favourable exhibition over the existing typical methods. As an illustration, we apply the proposed method to the diffuse large-B-cell lymphoma study.  相似文献   

6.
This paper is concerned with the stable feature screening for the ultrahigh dimensional data. To deal with the ultrahigh dimensional data problem and screen the important features, a set-averaging measurement is proposed. The model averaging technique and the conditional quantile method are used to construct the weighted set-averaging feature screening procedure to identify the relationships between the possible predictors and the response variable. The proposed screening method is model free, stable and possesses the sure screening property under some regular conditions. Some Monte Carlo simulations and a real data application are conducted to evaluate the performance of the proposed procedure.  相似文献   

7.
Most feature screening methods for ultrahigh-dimensional classification explicitly or implicitly assume the covariates are continuous. However, in the practice, it is quite common that both categorical and continuous covariates appear in the data, and applicable feature screening method is very limited. To handle this non-trivial situation, we propose an entropy-based feature screening method, which is model free and provides a unified screening procedure for both categorical and continuous covariates. We establish the sure screening and ranking consistency properties of the proposed procedure. We investigate the finite sample performance of the proposed procedure by simulation studies and illustrate the method by a real data analysis.  相似文献   

8.
Within a Monte Carlo study finite sample results are obtained for different generalized rank tests based on randomly censored life time data. It is pointed out that conditional tests should be applied in practice whenever drastic differences between the censoring distributions for the underlying groups do not appear. The tests are slight modifications of known permutation tests for censored data.  相似文献   

9.
This article is concerned with feature screening for the ultrahigh dimensional discriminant analysis. A variance ratio screening method is proposed and the sure screening property of this screening procedure is proved. The proposed method has some additional desirable features. First, it is model-free which does not require specific discriminant model and can be directly applied to the multi-categories situation. Second, it can effectively screen main effects and interaction effects simultaneously. Third, it is relatively inexpensive in computational cost because of the simple structure. The finite sample properties are performed through the Monte Carlo simulation studies and two real-data analyses.  相似文献   

10.
The varying-coefficient model is an important nonparametric statistical model since it allows appreciable flexibility on the structure of fitted model. For ultra-high dimensional heterogeneous data it is very necessary to examine how the effects of covariates vary with exposure variables at different quantile level of interest. In this paper, we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable. Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework. This ensures the sure screening property in quantile-adaptive varying-coefficient model. Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data.  相似文献   

11.
Ultra-high dimensional data arise in many fields of modern science, such as medical science, economics, genomics and imaging processing, and pose unprecedented challenge for statistical analysis. With such rapid-growth size of scientific data in various disciplines, feature screening becomes a primary step to reduce the high dimensionality to a moderate scale that can be handled by the existing penalized methods. In this paper, we introduce a simple and robust feature screening method without any model assumption to tackle high dimensional censored data. The proposed method is model-free and hence applicable to a general class of survival models. The sure screening and ranking consistency properties without any finite moment condition of the predictors and the response are established. The computation of the proposed method is rather straightforward. Finite sample performance of the newly proposed method is examined via extensive simulation studies. An application is illustrated with the gene association study of the mantle cell lymphoma.  相似文献   

12.
The currently existing estimation methods and goodness-of-fit tests for the Cox model mainly deal with right censored data, but they do not have direct extension to other complicated types of censored data, such as doubly censored data, interval censored data, partly interval-censored data, bivariate right censored data, etc. In this article, we apply the empirical likelihood approach to the Cox model with complete sample, derive the semiparametric maximum likelihood estimators (SPMLE) for the Cox regression parameter and the baseline distribution function, and establish the asymptotic consistency of the SPMLE. Via the functional plug-in method, these results are extended in a unified approach to doubly censored data, partly interval-censored data, and bivariate data under univariate or bivariate right censoring. For these types of censored data mentioned, the estimation procedures developed here naturally lead to Kolmogorov-Smirnov goodness-of-fit tests for the Cox model. Some simulation results are presented.  相似文献   

13.
We study the estimation of the strength of signals corresponding to the high valued observations in multivariate binary data. These problems can arise in a variety of areas, such as mass spectrometry or function magnetic resonance imaging (fMRI), where the underlying signals could be interpreted as a proxy for biochemical or physiological response to a condition or treatment. More specifically, the problem we consider involves estimating the sum of a collection of binomial probabilities corresponding to large values of the associated binomial random variables. We emphasize the case where the dimension is much greater than the sample size, and most of the probabilities of the events of interest are close to zero. Two estimation approaches are proposed: conditional maximum likelihood and nonparametric empirical Bayes. We use these estimators to construct a test of homogeneity for two groups of high dimensional multivariate binary data. Simulation studies on the size and power of the proposed tests are given, and the tests are demonstrated using mass spectrometry data from a breast cancer study.  相似文献   

14.
Let X1,., Xn, be i.i.d. random variables with distribution function F, and let Y1,.,.,Yn be i.i.d. with distribution function G. For i = 1, 2,.,., n set δi, = 1 if Xi ≤ Yi, and 0 otherwise, and Xi, = min{Xi, Ki}. A kernel-type density estimate of f, the density function of F w.r.t. Lebesgue measure on the Borel o-field, based on the censored data (δi, Xi), i = 1,.,.,n, is considered. Weak and strong uniform consistency properties over the whole real line are studied. Rates of convergence results are established under higher-order differentiability assumption on f. A procedure for relaxing such assumptions is also proposed.  相似文献   

15.
Variable screening for censored survival data is most challenging when both survival and censoring times are correlated with an ultrahigh-dimensional vector of covariates. Existing approaches to handling censoring often make use of inverse probability weighting by assuming independent censoring with both survival time and covariates. This is a convenient but rather restrictive assumption which may be unmet in real applications, especially when the censoring mechanism is complex and the number of covariates is large. To accommodate heterogeneous (covariate-dependent) censoring that is often present in high-dimensional survival data, we propose a Gehan-type rank screening method to select features that are relevant to the survival time. The method is invariant to monotone transformations of the response and of the predictors, and works robustly for a general class of survival models. We establish the sure screening property of the proposed methodology. Simulation studies and a lymphoma data analysis demonstrate its favorable performance and practical utility.  相似文献   

16.
In this paper, we study the performance of a soccer player based on analysing an incomplete data set. To achieve this aim, we fit the bivariate Rayleigh distribution to the soccer dataset by the maximum likelihood method. In this way, the missing data and right censoring problems, that usually happen in such studies, are considered. Our aim is to inference about the performance of a soccer player by considering the stress and strength components. The first goal of the player of interest in a match is assumed as the stress component and the second goal of the match is assumed as the strength component. We propose some methods to overcome incomplete data problem and we use these methods to inference about the performance of a soccer player.  相似文献   

17.
18.
This paper considers the problem of variance estimation for sparse ultra-high dimensional varying coefficient models. We first use B-spline to approximate the coefficient functions, and discuss the asymptotic behavior of a naive two-stage estimator of error variance. We also reveal that this naive estimator may significantly underestimate the error variance due to the spurious correlations, which are even higher for nonparametric models than linear models. This prompts us to propose an accurate estimator of the error variance by effectively integrating the sure independence screening and the refitted cross-validation techniques. The consistency and the asymptotic normality of the resulting estimator are established under some regularity conditions. The simulation studies are carried out to assess the finite sample performance of the proposed methods.  相似文献   

19.
Case‐cohort design has been demonstrated to be an economical and efficient approach in large cohort studies when the measurement of some covariates on all individuals is expensive. Various methods have been proposed for case‐cohort data when the dimension of covariates is smaller than sample size. However, limited work has been done for high‐dimensional case‐cohort data which are frequently collected in large epidemiological studies. In this paper, we propose a variable screening method for ultrahigh‐dimensional case‐cohort data under the framework of proportional model, which allows the covariate dimension increases with sample size at exponential rate. Our procedure enjoys the sure screening property and the ranking consistency under some mild regularity conditions. We further extend this method to an iterative version to handle the scenarios where some covariates are jointly important but are marginally unrelated or weakly correlated to the response. The finite sample performance of the proposed procedure is evaluated via both simulation studies and an application to a real data from the breast cancer study.  相似文献   

20.
In many conventional scientific investigations with high or ultra-high dimensional feature spaces, the relevant features, though sparse, are large in number compared with classical statistical problems, and the magnitude of their effects tapers off. It is reasonable to model the number of relevant features as a diverging sequence when sample size increases. In this paper, we investigate the properties of the extended Bayes information criterion (EBIC) (Chen and Chen, 2008) for feature selection in linear regression models with diverging number of relevant features in high or ultra-high dimensional feature spaces. The selection consistency of the EBIC in this situation is established. The application of EBIC to feature selection is considered in a SCAD cum EBIC procedure. Simulation studies are conducted to demonstrate the performance of the SCAD cum EBIC procedure in finite sample cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号