首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article is concerned with feature screening for the ultrahigh dimensional discriminant analysis. A variance ratio screening method is proposed and the sure screening property of this screening procedure is proved. The proposed method has some additional desirable features. First, it is model-free which does not require specific discriminant model and can be directly applied to the multi-categories situation. Second, it can effectively screen main effects and interaction effects simultaneously. Third, it is relatively inexpensive in computational cost because of the simple structure. The finite sample properties are performed through the Monte Carlo simulation studies and two real-data analyses.  相似文献   

2.
In high-dimensional linear regression, the dimension of variables is always greater than the sample size. In this situation, the traditional variance estimation technique based on ordinary least squares constantly exhibits a high bias even under sparsity assumption. One of the major reasons is the high spurious correlation between unobserved realized noise and several predictors. To alleviate this problem, a refitted cross-validation (RCV) method has been proposed in the literature. However, for a complicated model, the RCV exhibits a lower probability that the selected model includes the true model in case of finite samples. This phenomenon may easily result in a large bias of variance estimation. Thus, a model selection method based on the ranks of the frequency of occurrences in six votes from a blocked 3×2 cross-validation is proposed in this study. The proposed method has a considerably larger probability of including the true model in practice than the RCV method. The variance estimation obtained using the model selected by the proposed method also shows a lower bias and a smaller variance. Furthermore, theoretical analysis proves the asymptotic normality property of the proposed variance estimation.  相似文献   

3.
Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis.  相似文献   

4.
In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies \(p=o\{\exp (an)\}\), where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set.  相似文献   

5.
Variance estimation is an important topic in nonparametric regression. In this paper, we propose a pairwise regression method for estimating the residual variance. Specifically, we regress the squared difference between observations on the squared distance between design points, and then estimate the residual variance as the intercept. Unlike most existing difference-based estimators that require a smooth regression function, our method applies to regression models with jump discontinuities. Our method also applies to the situations where the design points are unequally spaced. Finally, we conduct extensive simulation studies to evaluate the finite-sample performance of the proposed method and compare it with some existing competitors.  相似文献   

6.
This paper considers the problem of variance estimation for sparse ultra-high dimensional varying coefficient models. We first use B-spline to approximate the coefficient functions, and discuss the asymptotic behavior of a naive two-stage estimator of error variance. We also reveal that this naive estimator may significantly underestimate the error variance due to the spurious correlations, which are even higher for nonparametric models than linear models. This prompts us to propose an accurate estimator of the error variance by effectively integrating the sure independence screening and the refitted cross-validation techniques. The consistency and the asymptotic normality of the resulting estimator are established under some regularity conditions. The simulation studies are carried out to assess the finite sample performance of the proposed methods.  相似文献   

7.
We propose a new modified (biased) cross-validation method for adaptively determining the bandwidth in a nonparametric density estimation setup. It is shown that the method provides consistent minimizers. Some simulation results are reported on which compare the small sample behavior of the new and the classical cross-validation selectors.  相似文献   

8.
Multiple regression when regressors are measured on two different sized experimental units involves a nested error structure. This nested error structure consists of two variance components. Sufficient conditions are presented under which UMVU estimators of these variance components exist. When these conditions are not met, two alternative estimators for the two variance components are considered and compared when possible.

This paper considers multiple regression models when regressor variables are associated with different sized experimental units resulting in a nested error structure. Nested error structures occur because of restrictions placed on randomizations. This results in experiments similar to splitplot type experiments which involove two different sizes of experimental units. Data resulting from these type of experiments consists of measurements made on larger sized experimental units as well as measurements made on smaller sized experimental units. Split-plot type experiments occur when certain combinations of the treatment factors are randomly assigned to larger sized experimental units after which these units are split or divided and other combinations of the treatment factors are randomly assigned to the split units.  相似文献   

9.
Summary.  We consider the problem of estimating the noise variance in homoscedastic nonparametric regression models. For low dimensional covariates t  ∈  R d ,  d =1, 2, difference-based estimators have been investigated in a series of papers. For a given length of such an estimator, difference schemes which minimize the asymptotic mean-squared error can be computed for d =1 and d =2. However, from numerical studies it is known that for finite sample sizes the performance of these estimators may be deficient owing to a large finite sample bias. We provide theoretical support for these findings. In particular, we show that with increasing dimension d this becomes more drastic. If d 4, these estimators even fail to be consistent. A different class of estimators is discussed which allow better control of the bias and remain consistent when d 4. These estimators are compared numerically with kernel-type estimators (which are asymptotically efficient), and some guidance is given about when their use becomes necessary.  相似文献   

10.
This paper is concerned with the stable feature screening for the ultrahigh dimensional data. To deal with the ultrahigh dimensional data problem and screen the important features, a set-averaging measurement is proposed. The model averaging technique and the conditional quantile method are used to construct the weighted set-averaging feature screening procedure to identify the relationships between the possible predictors and the response variable. The proposed screening method is model free, stable and possesses the sure screening property under some regular conditions. Some Monte Carlo simulations and a real data application are conducted to evaluate the performance of the proposed procedure.  相似文献   

11.
We consider a variance estimation when a stratified single stage cluster sample is selected in the first phase and a stratified simple random element sample is selected in the second phase. We propose explicit formulas of (asymptotically), we propose explicit formulas of (asymptotically) unbiased variance estimators for the double expansion estimator and regression estimator. We perform a small simulation study to investigate the performance of the proposed variance estimators. In our simulation study, the proposed variance estimator showed better or comparable performance to the Jackknife variance estimator. We also extend the results to a two-phase sampling design in which a stratified pps with replacement cluster sample is selected in the first phase.  相似文献   

12.
Abstract

Small area estimation techniques have got a lot of attention during the last decades due to their important applications in survey studies. Mixed linear models and reduced rank regression analysis are jointly used when considering small area estimation. Estimates of parameters are presented as well as prediction of random effects and unobserved area measurements.  相似文献   

13.
Abstract

In this paper, a class of variance estimator is proposed of a finite population variance under an adaptive cluster sampling design in the presence of information on an auxiliary variable. We obtain expressions for the mean square error and bias for the developed estimators and their performance is evaluated on a Poisson clustered process and a real data set. The simulation study evaluates the efficiency of the suggested estimators for an adaptive cluster sampling (ACS) design and the Isaki (1983 Isaki, C. T. 1983. Variance estimation using auxiliary information. Journal of the American Statistical Association 78 (381):11723. doi: 10.1080/01621459.1983.10477939.[Taylor & Francis Online], [Web of Science ®] [Google Scholar]) estimator of the variance for SRSWOR over the sample variance for SRSWOR.  相似文献   

14.
In this paper, we consider sure independence feature screening for ultrahigh dimensional discriminant analysis. We propose a new method named robust rank screening based on the conditional expectation of the rank of predictor’s samples. We also establish the sure screening property for the proposed procedure under simple assumptions. The new procedure has some additional desirable characters. First, it is robust against heavy-tailed distributions, potential outliers and the sample shortage for some categories. Second, it is model-free without any specification of a regression model and directly applicable to the situation with many categories. Third, it is simple in theoretical derivation due to the boundedness of the resulting statistics. Forth, it is relatively inexpensive in computational cost because of the simple structure of the screening index. Monte Carlo simulations and real data examples are used to demonstrate the finite sample performance.  相似文献   

15.
Longitudinal or clustered response data arise in many applications such as biostatistics, epidemiology and environmental studies. The repeated responses cannot in general be assumed to be independent. One method of analysing such data is by using the generalized estimating equations (GEE) approach. The current GEE method for estimating regression effects in longitudinal data focuses on the modelling of the working correlation matrix assuming a known variance function. However, correct choice of the correlation structure may not necessarily improve estimation efficiency for the regression parameters if the variance function is misspecified [Wang YG, Lin X. Effects of variance-function misspecification in analysis of longitudinal data. Biometrics. 2005;61:413–421]. In this connection two problems arise: finding a correct variance function and estimating the parameters of the chosen variance function. In this paper, we study the problem of estimating the parameters of the variance function assuming that the form of the variance function is known and then the effect of a misspecified variance function on the estimates of the regression parameters. We propose a GEE approach to estimate the parameters of the variance function. This estimation approach borrows the idea of Davidian and Carroll [Variance function estimation. J Amer Statist Assoc. 1987;82:1079–1091] by solving a nonlinear regression problem where residuals are regarded as the responses and the variance function is regarded as the regression function. A limited simulation study shows that the proposed method performs at least as well as the modified pseudo-likelihood approach developed by Wang and Zhao [A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics. 2007;63:681–689]. Both these methods perform better than the GEE approach.  相似文献   

16.
Presenting a general procedure of eliciting a randomized response (RR) from selected persons in order to estimate the total of a sensitive variable related to a finite survey population, we consider two estimators along with variance estimators treating the case of sampling with probabilities proportional to (known) size measures (PPS) with replacement (WR), drawing analogies with multi-stage sampling and note their relative efficacies.  相似文献   

17.
The predictive loss of Bayesian models can be estimated using a sample from the full-data posterior by evaluating the Watanabe-Akaike information criterion (WAIC) or using an importance sampling (ISCVL) approximation to leave-one-out cross-validation loss. With hierarchical models the loss can be specified at different levels of the hierarchy, and in the published literature, it is routine for these estimators to use the conditional likelihood provided by the lowest level of model hierarchy. However, the regularity conditions underlying these estimators may not hold at this level, and the behaviour of conditional-level WAIC as an estimator of conditional-level predictive loss must be determined on a case-by-case basis. Conditional-level ISCVL does not target conditional-level predictive loss and instead is an estimator of marginal-level predictive loss. Using examples for analysis of over-dispersed count data, it is shown that conditional-level WAIC does not provide a reliable estimator of its target loss, and simulations show that it can favour the incorrect model. Moreover, conditional-level ISCVL is numerically unstable compared to marginal-level ISCVL. It is recommended that WAIC and ISCVL be evaluated using the marginalized likelihood where practicable and that the reliability of these estimators always be checked using appropriate diagnostics.  相似文献   

18.
19.
Two versions of Yates-Grundy type variance estimators are usually employed for large samples when estimating a survey population total by a generalized regression (Greg, in brief) predictor motivated by consideration of a linear regression model. Their two alternative modifications are developed so that the limiting values of the design expectations of the model expectations of variance estimators 'match' respectively the (I) model expectations of the Taylor approximation of the design variance of the Greg predictor and the (II) limiting value of the design expectation of the model expectation of the squared difference between the Greg predictor and the population total. The exercise is extended to yield modifications needed when randomized response (RR) is only available rather than direct response (DR) when one encounters sensitive issues demanding protection of privacy. A comparative study based on simulation is presented for illustration..  相似文献   

20.
The cross-validation of principal components is a problem that occurs in many applications of statistics. The naive approach of omitting each observation in turn and repeating the principal component calculations is computationally costly. In this paper we present an efficient approach to leave-one-out cross-validation of principal components. This approach exploits the regular nature of leave-one-out principal component eigenvalue downdating. We derive influence statistics and consider the application to principal component regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号