首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a comparative study of the performance properties of one unbiased and two Stein-type estimators for combining the estimates of coefficients in a linear regression model when data sets are available from replicated experiments conducted at possibly different stations.  相似文献   

2.
High-dimensional data arise frequently in modern applications such as biology, chemometrics, economics, neuroscience and other scientific fields. The common features of high-dimensional data are that many of predictors may not be significant, and there exists high correlation among predictors. Generalized linear models, as the generalization of linear models, also suffer from the collinearity problem. In this paper, combining the nonconvex penalty and ridge regression, we propose the weighted elastic-net to deal with the variable selection of generalized linear models on high dimension and give the theoretical properties of the proposed method with a diverging number of parameters. The finite sample behavior of the proposed method is illustrated with simulation studies and a real data example.  相似文献   

3.
A method called FICYREG of estimating regression coefficients is introduced. This is a generalization to the multivariate regression problem of the James-Stein estimator. When suitably représentés FICYREG emerges as a rule in which the canonical variates and canonical correlations have an intrinsic role to play. By exploiting these objects FICYREG is able to achieve stability against the influence of the “noise” present in problems where the responses are correlated so that some of the response vector's canonical variates will be essentially independent of all others including the predictors. The least squares (LS) estimator is, by contrast, highly sensitive to this noise. The use of FICYREG is illustrated in terms of an example, and its peformance is compared to the LS estimator when a quadratic loss function is assumed. The cases of both fixed and random predictors are considered. Overall, FICYREG outperforms the LS estimator.  相似文献   

4.
We describe a method for estimating the coefficients in a logistic regression model when the predictors are subject to measurement error and an instrumental variable is present. The proposed method is based upon the theory of factor scores taken from factor analysis. Two versions of the proposed method, a simple one and an extended one, are compared to the methods referred to by Carrol, Ruppert and Stefanski (1995) through simulation studies. Our conclusion is that the simple version performs as well as the methods from Carrol et al. (1995), and the extended version performs betterwith respect to MSE, due to a reduction of bias.  相似文献   

5.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

6.
Non-parametric Quantile Regression with Censored Data   总被引:1,自引:0,他引:1  
Abstract.  Censored regression models have received a great deal of attention in both the theoretical and applied statistics literature. Here, we consider a model in which the response variable is censored but not the covariates. We propose a new estimator of the conditional quantiles based on the local linear method, and give an algorithm for its numerical implementation. We study its asymptotic properties and evaluate its performance on simulated data sets.  相似文献   

7.
Estimation in the presence of censoring is an important problem. In the linear model, the Buckley-James method proceeds iteratively by estimating the censored values than re-estimating the regression coeffi- cients. A large-scale Monte Carlo simulation technique has been developed to test the performance of the Buckley-James (denoted B-J) estimator. One hundred and seventy two randomly generated data sets, each with three thousand replications, based on four failure distributions, four censoring patterns, three sample sizes and four censoring rates have been investigated, and the results are presented. It is found that, except for Type I1 censoring, the B-J estimator is essentially unbiased, even when the data sets with small sample sizes are subjected to a high censoring rate. The variance formula suggested by Buckley and James (1979) is shown to be sensitive to the failure distribution. If the censoring rate is kept constant along the covariate line, the sample variance of the estimator appears to be insensitive to the censoring pattern with a selected failure distribution. Oscillation of the convergence values associated with the B-J estimator is illustrated and thoroughly discussed.  相似文献   

8.
Summary. We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments of each curve have been observed. We demonstrate, on both simulated and real data sets, how this approach can be used to perform linear, logistic and censored regression with functional predictors. In addition, we show how functional principal components can be used to gain insight into the relationship between the response and functional predictors. Finally, we extend the methodology to apply generalized linear models and principal components to standard missing data problems.  相似文献   

9.
A new regularization method for regression models is proposed. The criterion to be minimized contains a penalty term which explicitly links strength of penalization to the correlation between predictors. Like the elastic net, the method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. A boosted version of the penalized estimator, which is based on a new boosting method, allows to select variables. Real world data and simulations show that the method compares well to competing regularization techniques. In settings where the number of predictors is smaller than the number of observations it frequently performs better than competitors, in high dimensional settings prediction measures favor the elastic net while accuracy of estimation and stability of variable selection favors the newly proposed method.  相似文献   

10.
Much of the small‐area estimation literature focuses on population totals and means. However, users of survey data are often interested in the finite‐population distribution of a survey variable and in the measures (e.g. medians, quartiles, percentiles) that characterize the shape of this distribution at the small‐area level. In this paper we propose a model‐based direct estimator (MBDE, Chandra and Chambers) of the small‐area distribution function. The MBDE is defined as a weighted sum of sample data from the area of interest, with weights derived from the calibrated spline‐based estimate of the finite‐population distribution function introduced by Harms and Duchesne, under an appropriately specified regression model with random area effects. We also discuss the mean squared error estimation of the MBDE. Monte Carlo simulations based on both simulated and real data sets show that the proposed MBDE and its associated mean squared error estimator perform well when compared with alternative estimators of the area‐specific finite‐population distribution function.  相似文献   

11.
12.
13.
Biao Zhang 《Statistics》2016,50(5):1173-1194
Missing covariate data occurs often in regression analysis. We study methods for estimating the regression coefficients in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866] on regression analyses with missing covariates, in which they pioneered the use of two working models, the working propensity score model and the working conditional score model. A recent approach to missing covariate data analysis is the empirical likelihood method of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503], which effectively combines unbiased estimating equations. In this paper, we consider an alternative likelihood approach based on the full likelihood of the observed data. This full likelihood-based method enables us to generate estimators for the vector of the regression coefficients that are (a) asymptotically equivalent to those of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the working propensity score model is correctly specified, and (b) doubly robust, like the augmented inverse probability weighting (AIPW) estimators of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89:846–866]. Thus, the proposed full likelihood-based estimators improve on the efficiency of the AIPW estimators when the working propensity score model is correct but the working conditional score model is possibly incorrect, and also improve on the empirical likelihood estimators of Qin, Zhang and Leung [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the reverse is true, that is, the working conditional score model is correct but the working propensity score model is possibly incorrect. In addition, we consider a regression method for estimation of the regression coefficients when the working conditional score model is correctly specified; the asymptotic variance of the resulting estimator is no greater than the semiparametric variance bound characterized by the theory of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866]. Finally, we compare the finite-sample performance of various estimators in a simulation study.  相似文献   

14.
Summary.  Spline-based approaches to non-parametric and semiparametric regression, as well as to regression of scalar outcomes on functional predictors, entail choosing a parameter controlling the extent to which roughness of the fitted function is penalized. We demonstrate that the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values. These ideas are illustrated by application of functional principal component regression, a method for regressing scalars on functions, to two chemometric data sets.  相似文献   

15.
It is well-known in the literature on multicollinearity that one of the major consequences of multicollinearity on the ordinary least squares estimator is that the estimator produces large sampling variances, which in turn might inappropriately lead to exclusion of otherwise significant coefficients from the model. To circumvent this problem, two accepted estimation procedures which are often suggested are the restricted least squares method and the ridge regression method. While the former leads to a reduction in the sampling variance of the estimator, the later ensures a smaller mean square error value for the estimator. In this paper we have proposed a new estimator which is based on a criterion that combines the ideas underlying these two estimators. The standard properties of this new estimator have been studied in the paper. It has also been shown that this estimator is superior to both the restricted least squares as well as the ordinary ridge regression estimators by the criterion of mean sauare error of the estimator of the regression coefficients when the restrictions are indeed correct. The conditions for superiority of this estimator over the other two have also been derived for the situation when the restrictions are not correct.  相似文献   

16.
Summary. Many geophysical regression problems require the analysis of large (more than 104 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M -estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii ,  m ,  N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.  相似文献   

17.
The authors consider variance estimation for the generalized regression estimator in a two‐phase context when the first‐phase sample has been restratified using information gathered from the first‐phase sample. Simple computational expressions for variance estimation are provided for the double expansion estimator and the reweighted expansion estimator of Kott & Stukel (1997). These estimators are compared using data from the Canadian Retail Commodity Survey.  相似文献   

18.
In this article, we aim to study the linearized ridge regression (LRR) estimator in a linear regression model motivated by the work of Liu (1993). The LRR estimator and the two types of generalized Liu estimators are investigated under the PRESS criterion. The method of obtaining the optimal generalized ridge regression (GRR) estimator is derived from the optimal LRR estimator. We apply the Hald data as a numerical example and then make a simulation study to show the main results. It is concluded that the idea of transforming the GRR estimator as a complicated function of the biasing parameters to a linearized version should be paid more attention in the future.  相似文献   

19.
In this article, we consider the Stein-type approach to the estimation of the regression parameter in a multiple regression model under a multicollinearity situation. The Stein-type two-parameter estimator is proposed when it is suspected that the regression parameter may be restricted to a subspace. The bias and the quadratic risk of the proposed estimator are derived and compared with the two-parameter estimator (TPE), the restricted TPE and the preliminary test TPE. The conditions of superiority of the proposed estimator are obtained. Finally, a real data example is provided to illustrate some of the theoretical results.  相似文献   

20.
This paper presents a comprehensive comparison of well-known partially adaptive estimators (PAEs) in terms of efficiency in estimating regression parameters. The aim is to identify the best estimators of regression parameters when error terms follow from normal, Laplace, Student's t, normal mixture, lognormal and gamma distribution via the Monte Carlo simulation. In the results of the simulation, efficient PAEs are determined in the case of symmetric leptokurtic and skewed leptokurtic regression error data. Additionally, these estimators are also compared in terms of regression applications. Regarding these applications, using certain standard error estimators, it is shown that PAEs can reduce the standard error of the slope parameter estimate relative to ordinary least squares.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号