期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantile-adaptive variable screening in ultra-high dimensional varying coefficient models

Junying Zhang Zhiping Lu 《Journal of applied statistics》2016,43(4):643-654

The varying-coefficient model is an important nonparametric statistical model since it allows appreciable flexibility on the structure of fitted model. For ultra-high dimensional heterogeneous data it is very necessary to examine how the effects of covariates vary with exposure variables at different quantile level of interest. In this paper, we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable. Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework. This ensures the sure screening property in quantile-adaptive varying-coefficient model. Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data. 相似文献

2.

Feature Screening for Ultrahigh Dimensional Categorical Data With Applications

Danyang Huang Runze Li Hansheng Wang 《商业与经济统计学杂志》2014,32(2):237-244

Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies and illustrate the proposed method by two empirical datasets. 相似文献

3.

How to Make Model‐free Feature Screening Approaches for Full Data Applicable to the Case of Missing Response?

《Scandinavian Journal of Statistics》2018,45(2):324-346

It is quite a challenge to develop model‐free feature screening approaches for missing response problems because the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops some novel methods by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh‐dimensional covariates with full data can be applied to missing response case. The first method is the so‐called missing indicator imputation screening, which is developed by proving that the set of the active predictors of interest for the response is a subset of the active predictors for the product of the response and missingness indicator under some mild conditions. As an alternative, another method called Venn diagram‐based approach is also developed. The sure screening property is proven for both methods. It is shown that the complete case analysis can also keep the sure screening property of any feature screening approach with sure screening property. 相似文献

4.

Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification

Lyu Ni 《Journal of nonparametric statistics》2016,28(3):515-530

Most feature screening methods for ultrahigh-dimensional classification explicitly or implicitly assume the covariates are continuous. However, in the practice, it is quite common that both categorical and continuous covariates appear in the data, and applicable feature screening method is very limited. To handle this non-trivial situation, we propose an entropy-based feature screening method, which is model free and provides a unified screening procedure for both categorical and continuous covariates. We establish the sure screening and ranking consistency properties of the proposed procedure. We investigate the finite sample performance of the proposed procedure by simulation studies and illustrate the method by a real data analysis. 相似文献

5.

Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models

Fan J Feng Y Song R 《Journal of the American Statistical Association》2011,106(494):544-557

A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods. 相似文献

6.

Variance estimation for sparse ultra-high dimensional varying coefficient models

Zhaoliang Wang Liugen Xue 《统计学通讯:理论与方法》2019,48(5):1270-1283

This paper considers the problem of variance estimation for sparse ultra-high dimensional varying coefficient models. We first use B-spline to approximate the coefficient functions, and discuss the asymptotic behavior of a naive two-stage estimator of error variance. We also reveal that this naive estimator may significantly underestimate the error variance due to the spurious correlations, which are even higher for nonparametric models than linear models. This prompts us to propose an accurate estimator of the error variance by effectively integrating the sure independence screening and the refitted cross-validation techniques. The consistency and the asymptotic normality of the resulting estimator are established under some regularity conditions. The simulation studies are carried out to assess the finite sample performance of the proposed methods. 相似文献

7.

On correlation rank screening for ultra-high dimensional competing risks data

Xiaolin Chen Chenguang Li Tao Zhang Zhenlong Gao 《Journal of applied statistics》2022,49(7):1848

In recent years, numerous feature screening schemes have been developed for ultra-high dimensional standard survival data with only one failure event. Nevertheless, existing literature pays little attention to related investigations for competing risks data, in which subjects suffer from multiple mutually exclusive failures. In this article, we develop a new marginal feature screening for ultra-high dimensional time-to-event data to allow for competing risks. The proposed procedure is model-free, and robust against heavy-tailed distributions and potential outliers for time to the type of failure of interest. Apart from this, it is invariant to any monotone transformation of event time of interest. Under rather mild assumptions, it is shown that the newly suggested approach possesses the ranking consistency and sure independence screening properties. Some numerical studies are conducted to evaluate the finite-sample performance of our method and make a comparison with its competitor, while an application to a real data set is provided to serve as an illustration. 相似文献

8.

Model-free conditional feature screening for ultra-high dimensional right censored data

Xiaolin Chen 《Journal of Statistical Computation and Simulation》2018,88(12):2425-2446

This paper is concerned with the conditional feature screening for ultra-high dimensional right censored data with some previously identified important predictors. A new model-free conditional feature screening approach, conditional correlation rank sure independence screening, has been proposed and investigated theoretically. The suggested conditional screening procedure has several desirable merits. First, it is model free, and thus robust to model misspecification. Second, it has the advantage of robustness of heavy-tailed distributions of the response and the presence of potential outliers in response. Third, it is naturally applicable to complete data when there is no censoring. Through simulation studies, we demonstrate that the proposed approach outperforms the CoxCS of Hong et al. under some circumstances. A real dataset is used to illustrate the usefulness of the proposed conditional screening method. 相似文献

9.

Feature screening for case‐cohort studies with failure time outcome

Jing Zhang Haibo Zhou Yanyan Liu Jianwen Cai 《Scandinavian Journal of Statistics》2021,48(1):349-370

Case‐cohort design has been demonstrated to be an economical and efficient approach in large cohort studies when the measurement of some covariates on all individuals is expensive. Various methods have been proposed for case‐cohort data when the dimension of covariates is smaller than sample size. However, limited work has been done for high‐dimensional case‐cohort data which are frequently collected in large epidemiological studies. In this paper, we propose a variable screening method for ultrahigh‐dimensional case‐cohort data under the framework of proportional model, which allows the covariate dimension increases with sample size at exponential rate. Our procedure enjoys the sure screening property and the ranking consistency under some mild regularity conditions. We further extend this method to an iterative version to handle the scenarios where some covariates are jointly important but are marginally unrelated or weakly correlated to the response. The finite sample performance of the proposed procedure is evaluated via both simulation studies and an application to a real data from the breast cancer study. 相似文献

10.

Statistical inference on partial linear additive models with distortion measurement errors

《Statistical Methodology》2015

We consider statistical inference for partial linear additive models (PLAMs) when the linear covariates are measured with errors and distorted by unknown functions of commonly observable confounding variables. A semiparametric profile least squares estimation procedure is proposed to estimate unknown parameter under unrestricted and restricted conditions. Asymptotic properties for the estimators are established. To test a hypothesis on the parametric components, a test statistic based on the difference between the residual sums of squares under the null and alternative hypotheses is proposed, and we further show that its limiting distribution is a weighted sum of independent standard chi-squared distributions. A bootstrap procedure is further proposed to calculate critical values. Simulation studies are conducted to demonstrate the performance of the proposed procedure and a real example is analyzed for an illustration. 相似文献

11.

Stable feature screening for ultrahigh dimensional data

Peng Lai Fengli Song Yufei Gao 《Journal of the Korean Statistical Society》2019,48(2):221-232

This paper is concerned with the stable feature screening for the ultrahigh dimensional data. To deal with the ultrahigh dimensional data problem and screen the important features, a set-averaging measurement is proposed. The model averaging technique and the conditional quantile method are used to construct the weighted set-averaging feature screening procedure to identify the relationships between the possible predictors and the response variable. The proposed screening method is model free, stable and possesses the sure screening property under some regular conditions. Some Monte Carlo simulations and a real data application are conducted to evaluate the performance of the proposed procedure. 相似文献

12.

Variable screening for ultrahigh dimensional censored quantile regression

Jing Pan Shucong Zhang Yong Zhou 《Journal of Statistical Computation and Simulation》2019,89(3):395-413

Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis. 相似文献

13.

Fast Bayesian variable screenings for binary response regressions with small sample size

S.-M. Chang J.-Y. Tzeng 《Journal of Statistical Computation and Simulation》2017,87(14):2708-2723

Screening procedures play an important role in data analysis, especially in high-throughput biological studies where the datasets consist of more covariates than independent subjects. In this article, a Bayesian screening procedure is introduced for the binary response models with logit and probit links. In contrast to many screening rules based on marginal information involving one or a few covariates, the proposed Bayesian procedure simultaneously models all covariates and uses closed-form screening statistics. Specifically, we use the posterior means of the regression coefficients as screening statistics; by imposing a generalized g-prior on the regression coefficients, we derive the analytical form of their posterior means and compute the screening statistics without Markov chain Monte Carlo implementation. We evaluate the utility of the proposed Bayesian screening method using simulations and real data analysis. When the sample size is small, the simulation results suggest improved performance with comparable computational cost. 相似文献

14.

Pivotal variable detection of the covariance matrix and its application to high-dimensional factor models

Junlong Zhao Hongyu Zhao Lixing Zhu 《Statistics and Computing》2018,28(4):775-793

To estimate the high-dimensional covariance matrix, row sparsity is often assumed such that each row has a small number of nonzero elements. However, in some applications, such as factor modeling, there may be many nonzero loadings of the common factors. The corresponding variables are also correlated to one another and the rows are non-sparse or dense. This paper has three main aims. First, a detection method is proposed to identify the rows that may be non-sparse, or at least dense with many nonzero elements. These rows are called dense rows and the corresponding variables are called pivotal variables. Second, to determine the number of rows, a ridge ratio method is suggested, which can be regarded as a sure screening procedure. Third, to handle the estimation of high-dimensional factor models, a two-step procedure is suggested with the above screening as the first step. Simulations are conducted to examine the performance of the new method and a real dataset is analyzed for illustration. 相似文献

15.

Modified SEE variable selection for varying coefficient instrumental variable models

《Statistical Methodology》2013

We consider the problem of variable selection for a class of varying coefficient models with instrumental variables. We focus on the case that some covariates are endogenous variables, and some auxiliary instrumental variables are available. An instrumental variable based variable selection procedure is proposed by using modified smooth-threshold estimating equations (SEEs). The proposed procedure can automatically eliminate the irrelevant covariates by setting the corresponding coefficient functions as zero, and simultaneously estimate the nonzero regression coefficients by solving the smooth-threshold estimating equations. The proposed variable selection procedure avoids the convex optimization problem, and is flexible and easy to implement. Simulation studies are carried out to assess the performance of the proposed variable selection method. 相似文献

16.

A robust variable screening method for high-dimensional data

Tao Wang Lin Zheng Haiyang Liu 《Journal of applied statistics》2017,44(10):1839-1855

In practice, the presence of influential observations may lead to misleading results in variable screening problems. We, therefore, propose a robust variable screening procedure for high-dimensional data analysis in this paper. Our method consists of two steps. The first step is to define a new high-dimensional influence measure and propose a novel influence diagnostic procedure to remove those unusual observations. The second step is to utilize the sure independence screening procedure based on distance correlation to select important variables in high-dimensional regression analysis. The new influence measure and diagnostic procedure that we developed are model free. To confirm the effectiveness of the proposed method, we conduct simulation studies and a real-life data analysis to illustrate the merits of the proposed approach over some competing methods. Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods. 相似文献

17.

Quantile screening for ultra-high-dimensional heterogeneous data conditional on some variables

Yi Liu 《Journal of Statistical Computation and Simulation》2018,88(2):329-342

In this paper, we propose a conditional quantile independence screening approach for ultra-high-dimensional heterogeneous data given some known, significant and low-dimensional variables. The new method does not require imposing a specific model structure for the response and covariates and can detect additional features that contribute to conditional quantiles of the response given those already-identified important predictors. We also prove that the proposed procedure enjoys the ranking consistency and sure screening properties. Some simulation studies are carried out to examine the performance of advised procedure. At last, we illustrate it by a real data example. 相似文献

18.

GMM estimation in partial linear models with endogenous covariates causing an over-identified problem

Baicheng Chen Yong Zhou 《统计学通讯:理论与方法》2013,42(11):3168-3184

ABSTRACT

We study partial linear models where the linear covariates are endogenous and cause an over-identified problem. We propose combining the profile principle with local linear approximation and the generalized moment methods (GMM) to estimate the parameters of interest. We show that the profiled GMM estimators are root? n consistent and asymptotically normally distributed. By appropriately choosing the weight matrix, the estimators can attain the efficiency bound. We further consider variable selection by using the moment restrictions imposed on endogenous variables when the dimension of the covariates may be diverging with the sample size, and propose a penalized GMM procedure, which is shown to have the sparsity property. We establish asymptotic normality of the resulting estimators of the nonzero parameters. Simulation studies have been presented to assess the finite-sample performance of the proposed procedure. 相似文献

19.

Variance ratio screening for ultrahigh dimensional discriminant analysis

Fengli Song Baohua Shen Guosheng Cheng 《统计学通讯:理论与方法》2018,47(24):6034-6051

This article is concerned with feature screening for the ultrahigh dimensional discriminant analysis. A variance ratio screening method is proposed and the sure screening property of this screening procedure is proved. The proposed method has some additional desirable features. First, it is model-free which does not require specific discriminant model and can be directly applied to the multi-categories situation. Second, it can effectively screen main effects and interaction effects simultaneously. Third, it is relatively inexpensive in computational cost because of the simple structure. The finite sample properties are performed through the Monte Carlo simulation studies and two real-data analyses. 相似文献

20.

Robust sure independence screening for nonpolynomial dimensional generalized linear models

Abhik Ghosh Erica Ponzi Torkjel Sandanger Magne Thoresen 《Scandinavian Journal of Statistics》2023,50(3):1232-1262

We consider the problem of variable screening in ultra-high-dimensional generalized linear models (GLMs) of nonpolynomial orders. Since the popular SIS approach is extremely unstable in the presence of contamination and noise, we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs well under pure and contaminated data scenarios. We provide a theoretical motivation for the use of marginal MDPDEs for variable screening from both population as well as sample aspects; in particular, we prove that the marginal MDPDEs are uniformly consistent leading to the sure screening property of our proposed algorithm. Finally, we propose an appropriate MDPDE-based extension for robust conditional screening in GLMs along with the derivation of its sure screening property. Our proposed methods are illustrated through extensive numerical studies along with an interesting real data application. 相似文献