首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
To bootstrap a regression problem, pairs of response and explanatory variables or residuals can be resam‐pled, according to whether we believe that the explanatory variables are random or fixed. In the latter case, different residuals have been proposed in the literature, including the ordinary residuals (Efron 1979), standardized residuals (Bickel & Freedman 1983) and Studentized residuals (Weber 1984). Freedman (1981) has shown that the bootstrap from ordinary residuals is asymptotically valid when the number of cases increases and the number of variables is fixed. Bickel & Freedman (1983) have shown the asymptotic validity for ordinary residuals when the number of variables and the number of cases both increase, provided that the ratio of the two converges to zero at an appropriate rate. In this paper, the authors introduce the use of BLUS (Best Linear Unbiased with Scalar covariance matrix) residuals in bootstrapping regression models. The main advantage of the BLUS residuals, introduced in Theil (1965), is that they are uncorrelated. The main disadvantage is that only np residuals can be computed for a regression problem with n cases and p variables. The asymptotic results of Freedman (1981) and Bickel & Freedman (1983) for the ordinary (and standardized) residuals are generalized to the BLUS residuals. A small simulation study shows that even though only np residuals are available, in small samples bootstrapping BLUS residuals can be as good as, and sometimes better than, bootstrapping from standardized or Studentized residuals.  相似文献   

Five widely used test statistics for detecting outliers and influential observations were studied using Monte Carlo method . The test statistic based on Studentized residuals, with critical values given by Tietjen, Moore and Beckman (1973), appears to be the best procedure for detecting a single outlier in simple linear regression.  相似文献   

A well known method for obtaining conservative simultaneous confidence intervals for the K parameters in a linear regression model, or for K linear contrasts, is based on the percentage points of the Studentized maximum modulus distribution. From an inequality due to Sidak, conservative yet uniformly shorter confidence intervals would be possible if the percentage points of a particular form of the multivariate t distribution were available. The purpose of this paper is to provide the required percentage points. For K<8 the resulting confidence intervals can be substantially shorter.  相似文献   

In this paper we examine the properties of four types of residual vectors, arising from fitting a linear regression model to a set of data by least squares. The four types of residuals are (i) the Stepwise residuals (Hedayat and Robson, 1970), (ii) the Recursive residuals (Brown, Durbin, and Evans, 1975), (iii) the Sequentially Adjusted residuals (to be defined herein), and (iv) the BLUS residuals (Theil, 1965, 1971). We also study the relationships among the four residual vectors. It is found that, for any given sequence of observations, (i) the first three sets of residuals are identical, (ii) each of the first three sets, being identical, is a member of Thei’rs (1965, 1971) family of residuals; specifically, they are Linear Unbiased with a Scalar covariance matrix (LUS) but not Best Linear Unbiased with a Scalar covariance matrix (BLUS). We find the explicit form of the transformation matrix and show that the first three sets of residual vectors can be written as an orthogonal transformation of the BLUS residual vector. These and other properties may prove to be useful in the statistical analysis of residuals.  相似文献   

The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

In this study, we develop the adjusted deviance residuals for the gamma regression model (GRM) by following Cordeiro's (2004) method. These adjusted deviance residuals under the GRM are used for influence diagnostics. A comparative analysis has been sorted out between our proposed method of the adjusted deviance residuals and an existing method for influence diagnostics. These results are illustrated by a simulation study and using a real data set. They are presented for different values of dispersion and sample sizes and indicate the significant role of the GRM inferences.  相似文献   

We introduce and discuss three important regression diagnostics: leverage, Studentized residuals, and DFFITS. We then develop two approaches to bounded-influence robust regression based on these diagnostics. The methods are illustrated on a data set using a simple MINITAB program.  相似文献   

Outliers in multilevel data   总被引:2,自引:0,他引:2  
This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model.  相似文献   

Alternative methods of estimating properties of unknown distributions include the bootstrap and the smoothed bootstrap. In the standard bootstrap setting, Johns (1988) introduced an importance resam¬pling procedure that results in more accurate approximation to the bootstrap estimate of a distribution function or a quantile. With a suitable “exponential tilting” similar to that used by Johns, we derived a smoothed version of importance resampling in the framework of the smoothed bootstrap. Smoothed importance resampling procedures were developed for the estimation of distribution functions of the Studentized mean, the Studentized variance, and the correlation coefficient. Implementation of these procedures are presented via simulation results which concentrate on the problem of estimation of distribution functions of the Studentized mean and Studentized variance for different sample sizes and various pre-specified smoothing bandwidths for the normal data; additional simulations were conducted for the estimation of quantiles of the distribution of the Studentized mean under an optimal smoothing bandwidth when the original data were simulated from three different parent populations: lognormal, t(3) and t(10). These results suggest that in cases where it is advantageous to use the smoothed bootstrap rather than the standard bootstrap, the amount of resampling necessary might be substantially reduced by the use of importance resampling methods and the efficiency gains depend on the bandwidth used in the kernel density estimation.  相似文献   

The use of martingale residuals have been proposed for modelchecking and also to get a non-parametric estimate of the effectof an explanatory variable. We apply this approach to an epidemiologicalproblem which presents two characteristics: the data are lefttruncated due to delayed entry in the cohort; the data are groupedinto geographical units (parishes). This grouping suggests anatural way of smoothing the graph of residuals which is to computethe sum of the residuals for each parish. It is also naturalto present a graph with standardized residuals. We derive thevariances of the estimated residuals for left truncated datawhich allows computing the standardized residuals. This methodis applied to the study of dementia in a cohort of old people,and to the possible effect of the concentration of aluminum andsilica in drinking water on the risk of developing dementia.  相似文献   

The heterogeneity of error variance often causes a huge interpretive problem in linear regression analysis. Before taking any remedial measures we first need to detect this problem. A large number of diagnostic plots are now available in the literature for detecting heteroscedasticity of error variances. Among them the ‘residuals’ and ‘fits’ (R–F) plot is very popular and commonly used. In the R–F plot residuals are plotted against the fitted responses, where both these components are obtained using the ordinary least squares (OLS) method. It is now evident that the OLS fits and residuals suffer a huge setback in the presence of unusual observations and hence the R–F plot may not exhibit the real scenario. The deletion residuals based on a data set free from all unusual cases should estimate the true errors in a better way than the OLS residuals. In this paper we propose ‘deletion residuals’ and the ‘deletion fits’ (DR–DF) plot for the detection of the heterogeneity of error variances in a linear regression model to get a more convincing and reliable graphical display. Examples show that this plot locates unusual observations more clearly than the R–F plot. The advantage of using deletion residuals in the detection of heteroscedasticity of error variance is investigated through Monte Carlo simulations under a variety of situations.  相似文献   

This paper considers residuals for time series regression. Despite much literature on visual diagnostics for uncorrelated data, there is little on the autocorrelated case. To examine various aspects of the fitted time series regression model, three residuals are considered. The fitted regression model can be checked using orthogonal residuals; the time series error model can be analysed using marginal residuals; and the white noise error component can be tested using conditional residuals. When used together, these residuals allow identification of outliers, model mis‐specification and mean shifts. Due to the sensitivity of conditional residuals to model mis‐specification, it is suggested that the orthogonal and marginal residuals be examined first.  相似文献   

Correspondence analysis is a versatile statistical technique that allows the user to graphically identify the association that may exist between variables of a contingency table. For two categorical variables, the classical approach involves applying singular value decomposition to the Pearson residuals of the table. These residuals allow for one to use a simple test to determine those cells that deviate from what is expected under independence. However, the assumptions concerning these residuals are not always satisfied and so such results can lead to questionable conclusions.One may consider instead, an adjustment of the Pearson residual, which is known to have properties associated with the standard normal distribution. This paper explores the application of these adjusted residuals to correspondence analysis and determines how they impact upon the configuration of points in the graphical display.  相似文献   

Recursive residuals and their relationship to the recursive estimation of regression parameters have been developed for unvaried regression mod els. Such residuals and estimates have been used to test the constancy of regression over time. The current paper extends this work to multivariate regression modal.  相似文献   

The maximum absolute studentized residual is commonly used for testing for a single outlier in a linear regression model. This test statistic, however, is seldom discussed in a nonlinear regression setting. We simulate the critical values for the tests under various nonlinear models. The associated critical values are found to be very close to one another. Moreover, they are very well approximated using the critical values obtained from F-distributions based on the Bonferroni equations in linear models. The results are promising even in samples of size 6.  相似文献   

In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.  相似文献   

Goodness-of-fit tests for logistic regression models using extreme residuals are considered. Approximations to the moments of the Pearson residuals are given for model fits made by maximum likelihood, minimum chi-square and weighted least squares and used to define modified residuals. Approximations to the critical values of the extreme statistics based on the ordinary and modified Pearson residuals are developed and assessed for the case of a single explanatory variable.  相似文献   

The authors propose a bootstrap procedure which estimates the distribution of an estimating function by resampling its terms using bootstrap techniques. Studentized versions of this so‐called estimating function (EF) bootstrap yield methods which are invariant under reparametrizations. This approach often has substantial advantage, both in computation and accuracy, over more traditional bootstrap methods and it applies to a wide class of practical problems where the data are independent but not necessarily identically distributed. The methods allow for simultaneous estimation of vector parameters and their components. The authors use simulations to compare the EF bootstrap with competing methods in several examples including the common means problem and nonlinear regression. They also prove symptotic results showing that the studentized EF bootstrap yields higher order approximations for the whole vector parameter in a wide class of problems.  相似文献   

The aim of this paper is to introduce a new method which corrects residual variances for the butterfly distributed residuals (BDR). Distribution theory, confidence intervals, and tests of hypotheses are valid and meaningful only if the standard regression assumptions are satisfied. Heteroskedasticity is one of the violations of these assumptions and BDR is another type of heteroskedasticity. This study reveals an alternative approach to correct the BDR type of heteroskedasticity by the weighting re-estimated absolute residuals (WRAR). After giving brief information about heteroskedasticity and BDR type of heteroskedasticity, WRAR is introduced. WRAR and the usual variance stabilizing techniques are compared on multiple and simple regression models.  相似文献   

In this article, we introduce two monitoring schemes to (sequentially) detect structural changes in generalized linear models and develop asymptotic theories for them. The first method is based on cumulative sums (CUSUM) of weighted residuals, in which the unknown in-control parameters have been replaced by its maximum likelihood (ML) estimate from the training sample, whereas the second scheme makes use of moving sums (MOSUM) of weighted residuals. We characterize the limit distribution of the test statistic and show that these tests are consistent. Moreover, we also obtain and tabulate the asymptotic critical values of the tests. Finally, we study the speed of detection under different conditions. The methods are illustrated and compared in several simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号