期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Model-free slice screening for ultrahigh-dimensional survival data

Jing Zhang Yanyan Liu 《Journal of applied statistics》2021,48(10):1755

For ultrahigh-dimensional data, independent feature screening has been demonstrated both theoretically and empirically to be an effective dimension reduction method with low computational demanding. Motivated by the Buckley–James method to accommodate censoring, we propose a fused Kolmogorov–Smirnov filter to screen out the irrelevant dependent variables for ultrahigh-dimensional survival data. The proposed model-free screening method can work with many types of covariates (e.g. continuous, discrete and categorical variables) and is shown to enjoy the sure independent screening property under mild regularity conditions without requiring any moment conditions on covariates. In particular, the proposed procedure can still be powerful when covariates are strongly dependent on each other. We further develop an iterative algorithm to enhance the performance of our method while dealing with the practical situations where some covariates may be marginally unrelated but jointly related to the response. We conduct extensive simulations to evaluate the finite-sample performance of the proposed method, showing that it has favourable exhibition over the existing typical methods. As an illustration, we apply the proposed method to the diffuse large-B-cell lymphoma study. 相似文献

2.

Variable screening for ultrahigh dimensional censored quantile regression

Jing Pan Shucong Zhang Yong Zhou 《Journal of Statistical Computation and Simulation》2019,89(3):395-413

Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis. 相似文献

3.

Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification

Lyu Ni 《Journal of nonparametric statistics》2016,28(3):515-530

Most feature screening methods for ultrahigh-dimensional classification explicitly or implicitly assume the covariates are continuous. However, in the practice, it is quite common that both categorical and continuous covariates appear in the data, and applicable feature screening method is very limited. To handle this non-trivial situation, we propose an entropy-based feature screening method, which is model free and provides a unified screening procedure for both categorical and continuous covariates. We establish the sure screening and ranking consistency properties of the proposed procedure. We investigate the finite sample performance of the proposed procedure by simulation studies and illustrate the method by a real data analysis. 相似文献

4.

Feature screening for case‐cohort studies with failure time outcome

Jing Zhang Haibo Zhou Yanyan Liu Jianwen Cai 《Scandinavian Journal of Statistics》2021,48(1):349-370

Case‐cohort design has been demonstrated to be an economical and efficient approach in large cohort studies when the measurement of some covariates on all individuals is expensive. Various methods have been proposed for case‐cohort data when the dimension of covariates is smaller than sample size. However, limited work has been done for high‐dimensional case‐cohort data which are frequently collected in large epidemiological studies. In this paper, we propose a variable screening method for ultrahigh‐dimensional case‐cohort data under the framework of proportional model, which allows the covariate dimension increases with sample size at exponential rate. Our procedure enjoys the sure screening property and the ranking consistency under some mild regularity conditions. We further extend this method to an iterative version to handle the scenarios where some covariates are jointly important but are marginally unrelated or weakly correlated to the response. The finite sample performance of the proposed procedure is evaluated via both simulation studies and an application to a real data from the breast cancer study. 相似文献

5.

Feature Screening for Ultrahigh Dimensional Categorical Data With Applications

Danyang Huang Runze Li Hansheng Wang 《商业与经济统计学杂志》2014,32(2):237-244

Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies and illustrate the proposed method by two empirical datasets. 相似文献

6.

A Bayesian semiparametric accelerated failure time model for arbitrarily censored data with covariates subject to measurement error

Xiaoyan Lin 《统计学通讯:模拟与计算》2017,46(1):747-756

A flexible Bayesian semiparametric accelerated failure time (AFT) model is proposed for analyzing arbitrarily censored survival data with covariates subject to measurement error. Specifically, the baseline error distribution in the AFT model is nonparametrically modeled as a Dirichlet process mixture of normals. Classical measurement error models are imposed for covariates subject to measurement error. An efficient and easy-to-implement Gibbs sampler, based on the stick-breaking formulation of the Dirichlet process combined with the techniques of retrospective and slice sampling, is developed for the posterior calculation. An extensive simulation study is conducted to illustrate the advantages of our approach. 相似文献

7.

SIMEX method for censored quantile regression with measurement error

Guangcai Mao Yi Wei 《统计学通讯:模拟与计算》2017,46(10):7552-7560

Censored quantile regression serves as an important supplement to the Cox proportional hazards model in survival analysis. In addition to being exposed to censoring, some covariates may subject to measurement error. This leads to substantially biased estimate without taking this error into account. The SIMulation-EXtrapolation (SIMEX) method is an effective tool to handle the measurement error issue. We extend the SIMEX approach to the censored quantile regression with covariate measurement error. The algorithm is assessed via extensive simulations. A lung cancer study is analyzed to verify the validation of the proposed method. 相似文献

8.

Variable screening for survival data in the presence of heterogeneous censoring

Jinfeng Xu Wai Keung Li Zhiliang Ying 《Scandinavian Journal of Statistics》2020,47(4):1171-1191

Variable screening for censored survival data is most challenging when both survival and censoring times are correlated with an ultrahigh-dimensional vector of covariates. Existing approaches to handling censoring often make use of inverse probability weighting by assuming independent censoring with both survival time and covariates. This is a convenient but rather restrictive assumption which may be unmet in real applications, especially when the censoring mechanism is complex and the number of covariates is large. To accommodate heterogeneous (covariate-dependent) censoring that is often present in high-dimensional survival data, we propose a Gehan-type rank screening method to select features that are relevant to the survival time. The method is invariant to monotone transformations of the response and of the predictors, and works robustly for a general class of survival models. We establish the sure screening property of the proposed methodology. Simulation studies and a lymphoma data analysis demonstrate its favorable performance and practical utility. 相似文献

9.

Quantile screening for ultra-high-dimensional heterogeneous data conditional on some variables

Yi Liu 《Journal of Statistical Computation and Simulation》2018,88(2):329-342

In this paper, we propose a conditional quantile independence screening approach for ultra-high-dimensional heterogeneous data given some known, significant and low-dimensional variables. The new method does not require imposing a specific model structure for the response and covariates and can detect additional features that contribute to conditional quantiles of the response given those already-identified important predictors. We also prove that the proposed procedure enjoys the ranking consistency and sure screening properties. Some simulation studies are carried out to examine the performance of advised procedure. At last, we illustrate it by a real data example. 相似文献

10.

Model-free conditional feature screening for ultra-high dimensional right censored data

Xiaolin Chen 《Journal of Statistical Computation and Simulation》2018,88(12):2425-2446

This paper is concerned with the conditional feature screening for ultra-high dimensional right censored data with some previously identified important predictors. A new model-free conditional feature screening approach, conditional correlation rank sure independence screening, has been proposed and investigated theoretically. The suggested conditional screening procedure has several desirable merits. First, it is model free, and thus robust to model misspecification. Second, it has the advantage of robustness of heavy-tailed distributions of the response and the presence of potential outliers in response. Third, it is naturally applicable to complete data when there is no censoring. Through simulation studies, we demonstrate that the proposed approach outperforms the CoxCS of Hong et al. under some circumstances. A real dataset is used to illustrate the usefulness of the proposed conditional screening method. 相似文献

11.

Linear censored quantile regression: A novel minimum-distance approach

Mickaël De Backer Anouar El Ghouch Ingrid Van Keilegom 《Scandinavian Journal of Statistics》2020,47(4):1275-1306

In this article, we investigate a new procedure for the estimation of a linear quantile regression with possibly right-censored responses. Contrary to the main literature on the subject, we propose in this context to circumvent the formulation of conditional quantiles through the so-called “check” loss function that stems from the influential work of Koenker and Bassett (1978). Instead, our suggestion is here to estimate the quantile coefficients by minimizing an alternative measure of distance. In fact, our approach could be qualified as a generalization in a parametric regression framework of the technique consisting in inverting the conditional distribution of the response given the covariates. This is motivated by the knowledge that the main literature for censored data already relies on some nonparametric conditional distribution estimation as well. The ideas of effective dimension reduction are then exploited in order to accommodate for higher dimensional settings as well in this context. Extensive numerical results then suggest that such an approach provides a strongly competitive procedure to the classical approaches based on the check function, in fact both for complete and censored observations. From a theoretical prospect, both consistency and asymptotic normality of the proposed estimator for linear regression are obtained under classical regularity conditions. As a by-product, several asymptotic results on some “double-kernel” version of the conditional Kaplan–Meier distribution estimator based on effective dimension reduction, and its corresponding density estimator, are also obtained and may be of interest on their own. A brief application of our procedure to quasar data then serves to further highlight the relevance of the latter for quantile regression estimation with censored data. 相似文献

12.

Stable feature screening for ultrahigh dimensional data

Peng Lai Fengli Song Yufei Gao 《Journal of the Korean Statistical Society》2019,48(2):221-232

This paper is concerned with the stable feature screening for the ultrahigh dimensional data. To deal with the ultrahigh dimensional data problem and screen the important features, a set-averaging measurement is proposed. The model averaging technique and the conditional quantile method are used to construct the weighted set-averaging feature screening procedure to identify the relationships between the possible predictors and the response variable. The proposed screening method is model free, stable and possesses the sure screening property under some regular conditions. Some Monte Carlo simulations and a real data application are conducted to evaluate the performance of the proposed procedure. 相似文献

13.

Structural identification and variable selection in high-dimensional varying-coefficient models

Yuping Chen Wingkam Fung 《Journal of nonparametric statistics》2017,29(2):258-279

Varying-coefficient models have been widely used to investigate the possible time-dependent effects of covariates when the response variable comes from normal distribution. Much progress has been made for inference and variable selection in the framework of such models. However, the identification of model structure, that is how to identify which covariates have time-varying effects and which have fixed effects, remains a challenging and unsolved problem especially when the dimension of covariates is much larger than the sample size. In this article, we consider the structural identification and variable selection problems in varying-coefficient models for high-dimensional data. Using a modified basis expansion approach and group variable selection methods, we propose a unified procedure to simultaneously identify the model structure, select important variables and estimate the coefficient curves. The unique feature of the proposed approach is that we do not have to specify the model structure in advance, therefore, it is more realistic and appropriate for real data analysis. Asymptotic properties of the proposed estimators have been derived under regular conditions. Furthermore, we evaluate the finite sample performance of the proposed methods with Monte Carlo simulation studies and a real data analysis. 相似文献

14.

Semiparametric Quantile Regression Analysis of Right‐censored and Length‐biased Failure Time Data with Partially Linear Varying Effects

下载免费PDF全文

Xuerong Chen Yeqian Liu Jianguo Sun Yong Zhou 《Scandinavian Journal of Statistics》2016,43(4):921-938

Right‐censored and length‐biased failure time data arise in many fields including cross‐sectional prevalent cohort studies, and their analysis has recently attracted a great deal of attention. It is well‐known that for regression analysis of failure time data, two commonly used approaches are hazard‐based and quantile‐based procedures, and most of the existing methods are the hazard‐based ones. In this paper, we consider quantile regression analysis of right‐censored and length‐biased data and present a semiparametric varying‐coefficient partially linear model. For estimation of regression parameters, a three‐stage procedure that makes use of the inverse probability weighted technique is developed, and the asymptotic properties of the resulting estimators are established. In addition, the approach allows the dependence of the censoring variable on covariates, while most of the existing methods assume the independence between censoring variables and covariates. A simulation study is conducted and suggests that the proposed approach works well in practical situations. Also, an illustrative example is provided. 相似文献

15.

Model-free feature screening for ultrahigh dimensional censored regression

Tingyou Zhou Liping Zhu 《Statistics and Computing》2017,27(4):947-961

In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies \(p=o\{\exp (an)\}\), where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set. 相似文献

16.

Multiple imputation of censored survival data in the presence of missing covariates using restricted mean survival time

Gurprit Grover 《Journal of applied statistics》2015,42(4):817-827

Missing covariates data with censored outcomes put a challenge in the analysis of clinical data especially in small sample settings. Multiple imputation (MI) techniques are popularly used to impute missing covariates and the data are then analyzed through methods that can handle censoring. However, techniques based on MI are available to impute censored data also but they are not much in practice. In the present study, we applied a method based on multiple imputation by chained equations to impute missing values of covariates and also to impute censored outcomes using restricted survival time in small sample settings. The complete data were then analyzed using linear regression models. Simulation studies and a real example of CHD data show that the present method produced better estimates and lower standard errors when applied on the data having missing covariate values and censored outcomes than the analysis of the data having censored outcome but excluding cases with missing covariates or the analysis when cases with missing covariate values and censored outcomes were excluded from the data (complete case analysis). 相似文献

17.

A Semiparametric Approach for Accelerated Failure Time Models with Covariates Subject to Measurement Error

Jiajia Zhang Wenqing He Haifen Li 《统计学通讯:模拟与计算》2013,42(2):329-341

There are relatively few discussions about measurement error in the accelerated failure time (AFT) model, particularly for the semiparametric AFT model. In this article, we propose an adjusted estimation procedure for the semiparametric AFT model with covariates subject to measurement error, based on the profile likelihood approach and simulation and exploration (SIMEX) method. The simulation studies show that the proposed semiparametric SIMEX approach performs well. The proposed approach is applied to a coronary heart disease dataset from the Busselton Health study for illustration. 相似文献

18.

A corrected likelihood method for the proportional hazards model with covariates subject to measurement error

Grace Y. Yi Jerald F. Lawless 《Journal of statistical planning and inference》2007

There has been extensive interest in discussing inference methods for survival data when some covariates are subject to measurement error. It is known that standard inferential procedures produce biased estimation if measurement error is not taken into account. With the Cox proportional hazards model a number of methods have been proposed to correct bias induced by measurement error, where the attention centers on utilizing the partial likelihood function. It is also of interest to understand the impact on estimation of the baseline hazard function in settings with mismeasured covariates. In this paper we employ a weakly parametric form for the baseline hazard function and propose simple unbiased estimating functions for estimation of parameters. The proposed method is easy to implement and it reveals the connection between the naive method ignoring measurement error and the corrected method with measurement error accounted for. Simulation studies are carried out to evaluate the performance of the estimators as well as the impact of ignoring measurement error in covariates. As an illustration we apply the proposed methods to analyze a data set arising from the Busselton Health Study [Knuiman, M.W., Cullent, K.J., Bulsara, M.K., Welborn, T.A., Hobbs, M.S.T., 1994. Mortality trends, 1965 to 1989, in Busselton, the site of repeated health surveys and interventions. Austral. J. Public Health 18, 129–135]. 相似文献

19.

Feature screening in ultrahigh-dimensional additive Cox model

Guangren Yang Sumin Hou Luheng Wang Yanqing Sun 《Journal of Statistical Computation and Simulation》2018,88(6):1117-1133

The additive Cox model is flexible and powerful for modelling the dynamic changes of regression coefficients in the survival analysis. This paper is concerned with feature screening for the additive Cox model with ultrahigh-dimensional covariates. The proposed screening procedure can effectively identify active predictors. That is, with probability tending to one, the selected variable set includes the actual active predictors. In order to carry out the proposed procedure, we propose an effective algorithm and establish the ascent property of the proposed algorithm. We further prove that the proposed procedure possesses the sure screening property. Furthermore, we examine the finite sample performance of the proposed procedure via Monte Carlo simulations, and illustrate the proposed procedure by a real data example. 相似文献

20.

Regression Analysis of Length‐biased and Right‐censored Failure Time Data with Missing Covariates

下载免费PDF全文

Na Hu Xuerong Chen Jianguo Sun 《Scandinavian Journal of Statistics》2015,42(2):438-452

Length‐biased and right‐censored failure time data arise from many fields, and their analysis has recently attracted a great deal of attention. Two examples of the areas that often produce such data are epidemiological studies and cancer screening trials. In this paper, we discuss regression analysis of such data in the presence of missing covariates, for which no established inference procedure seems to exist. For the problem, we consider the data arising from the proportional hazards model and propose two inverse probability weighted estimation procedures. The asymptotic properties of the resulting estimators are established, and the extensive simulation study conducted for the evaluation of the proposed methods suggests that they work well for practical situations. 相似文献