期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Deletion residuals in the detection of heterogeneity of variances in linear regression

A. H.M. Rahmatullah Imon 《Journal of applied statistics》2009,36(3):347-358

The heterogeneity of error variance often causes a huge interpretive problem in linear regression analysis. Before taking any remedial measures we first need to detect this problem. A large number of diagnostic plots are now available in the literature for detecting heteroscedasticity of error variances. Among them the ‘residuals’ and ‘fits’ (R–F) plot is very popular and commonly used. In the R–F plot residuals are plotted against the fitted responses, where both these components are obtained using the ordinary least squares (OLS) method. It is now evident that the OLS fits and residuals suffer a huge setback in the presence of unusual observations and hence the R–F plot may not exhibit the real scenario. The deletion residuals based on a data set free from all unusual cases should estimate the true errors in a better way than the OLS residuals. In this paper we propose ‘deletion residuals’ and the ‘deletion fits’ (DR–DF) plot for the detection of the heterogeneity of error variances in a linear regression model to get a more convincing and reliable graphical display. Examples show that this plot locates unusual observations more clearly than the R–F plot. The advantage of using deletion residuals in the detection of heteroscedasticity of error variance is investigated through Monte Carlo simulations under a variety of situations. 相似文献

2.

Properties of Added Variable Plots in Cox's Regression Model

Lindkvist M 《Lifetime data analysis》2000,6(1):23-38

The added variable plot is useful for examining the effect of a covariate in regression models. The plot provides information regarding the inclusion of a covariate, and is useful in identifying influential observations on the parameter estimates. Hall et al. (1996) proposed a plot for Cox's proportional hazards model derived by regarding the Cox model as a generalized linear model. This paper proves and discusses properties of this plot. These properties make the plot a valuable tool in model evaluation. Quantities considered include parameter estimates, residuals, leverage, case influence measures and correspondence to previously proposed residuals and diagnostics. 相似文献

3.

Assessing the adequacy of Weibull survival models: a simulated envelope approach

Yun Zhao Kelvin K.W. Yau Geoffrey J. McLachlan 《Journal of applied statistics》2011,38(10):2089-2097

The Weibull proportional hazards model is commonly used for analysing survival data. However, formal tests of model adequacy are still lacking. It is well known that residual-based goodness-of-fit measures are inappropriate for censored data. In this paper, a graphical diagnostic plot of Cox–Snell residuals with a simulated envelope added is proposed to assess the adequacy of Weibull survival models. Both single component and two-component mixture models with random effects are considered for recurrent failure time data. The effectiveness of the diagnostic method is illustrated using simulated data sets and data on recurrent urinary tract infections of elderly women. 相似文献

4.

Diagnostic plots in beta-regression models

Li-Chu Chien 《Journal of applied statistics》2011,38(8):1607-1622

Two diagnostic plots for selecting explanatory variables are introduced to assess the accuracy of a generalized beta-linear model. The added variable plot is developed to examine the need for adding a new explanatory variable to the model. The constructed variable plot is developed to identify the nonlinearity of the explanatory variable in the model. The two diagnostic procedures are also useful for detecting unusual observations that may affect the regression much. Simulation studies and analysis of two practical examples are conducted to illustrate the performances of the proposed plots. 相似文献

5.

A robust diagnostic plot for explanatory variables under model mis-specification

Li-Chu Chien 《Journal of applied statistics》2011,38(1):113-126

A typical added variable plot is a commonly used plot in assessing the accuracy of a normal linear model. This plot is often used to evaluate the effect of adding an explanatory variable into the model and to detect possibly high leverage points or influential observations on the added variable. However, this type of plot is generally in doubt, once the normal distributional assumptions are violated. In this article, we extend the robust likelihood technique introduced by Royall and Tsou [11] to propose a robust added variable plot. The validity of this diagnostic plot requires no knowledge of the true underlying distributions so long as their second moments exist. The usefulness of the robust graphical approach is demonstrated through a few illustrations and simulations. 相似文献

6.

Added variable plots for linear regression with censored data

Peter J. Smith Lalith W. Peiris 《统计学通讯:理论与方法》2013,42(8):1987-2000

In linear regression the structure of the hat matrix plays an important part in regression diagnostics. In this note we investigate the properties of the hat matrix for regression with censored responses in the presence of one or more explanatory variables observed without censoring. The censored points in the scatterplot are renovated to positions had they been observed without censoring in a renovation process based on Buckley-James censored regression estimators. This allows natural links to be established with the structure of ordinary least squares estimators. In particular, we show that the renovated hat matrix may be partitioned in a manner which assists in deciding whether further explanatory variables should be added to the linear model. The added variable plot for regression with censored data is developed as a diagnostic tool for this decision process. 相似文献

7.

Residual analysis for spatial point processes (with discussion)

A. Baddeley R. Turner J. Møller M. Hazelton 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(5):617-666

Summary. We define residuals for point process models fitted to spatial point pattern data, and we propose diagnostic plots based on them. The residuals apply to any point process model that has a conditional intensity; the model may exhibit spatial heterogeneity, interpoint interaction and dependence on spatial covariates. Some existing ad hoc methods for model checking (quadrat counts, scan statistic, kernel smoothed intensity and Berman's diagnostic) are recovered as special cases. Diagnostic tools are developed systematically, by using an analogy between our spatial residuals and the usual residuals for (non-spatial) generalized linear models. The conditional intensity λ plays the role of the mean response. This makes it possible to adapt existing knowledge about model validation for generalized linear models to the spatial point process context, giving recommendations for diagnostic plots. A plot of smoothed residuals against spatial location, or against a spatial covariate, is effective in diagnosing spatial trend or co-variate effects. Q – Q -plots of the residuals are effective in diagnosing interpoint interaction. 相似文献

8.

Identification of Multiple Outliers in Logistic Regression

A. H. M. Rahmatullah Imon Ali S. Hadi 《统计学通讯:理论与方法》2013,42(11):1697-1709

The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples. 相似文献

9.

multiple and conditional deletion diagnostics for general linear models

Ingrid A. Baade Anthony N. Pettitt 《统计学通讯:理论与方法》2013,42(8):1899-1910

In this paper we develop multiple case deletion statistics for the general linear model so that a residual vector and a leverage matrix are identified which have roles analogous to residuals and leverage for ordinary least squares models. We extend the notion of the conditional deletion diagnostic to general linear models. The residuals, leverage and deletion diagnostics are illustrated with data modelled by a linear growth curve. 相似文献

10.

Truncated location-scale non linear regression models

Carolina Costa Mota Paraíba Carlos Alberto Ribeiro Diniz Aline de Holanda Nunes Maia Lineu Neiva Rodrigues 《统计学通讯:理论与方法》2017,46(15):7355-7374

We present a class of truncated non linear regression models for location and scale where the truncated nature of the data is incorporated into the statistical model by assuming that the response variable follows a truncated distribution. The location parameter of the response variable is assumed to be modeled by a continuous non linear function of covariates and unknown parameters. In addition, the proposed model also allows for the scale parameter of the responses to be characterized by a continuous function of the covariates and unknown parameters. Three particular cases of the proposed models are presented by considering the response variable to follow a truncated normal, truncated skew normal, and truncated beta distribution. These truncated non linear regression models are constructed assuming fixed known truncation limits and model parameters are estimated by direct maximization of the log-likelihood using a non linear optimization algorithm. Standardized residuals and diagnostic metrics based on the cases deletion are considered to verify the adequacy of the model and to detect outliers and influential observations. Results based on simulated data are presented to assess the frequentist properties of estimates, and a real data set on soil-water retention from the Buriti Vermelho River Basin database is analyzed using the proposed methodology. 相似文献

11.

A multiple-case deletion approach for detecting influential points in high-dimensional regression

Tao Wang Qun Li Qingpei Zang 《统计学通讯:模拟与计算》2013,42(7):2065-2082

ABSTRACT

In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure. 相似文献

12.

Pena's statistic for the Liu regression

Muhammad Kashif Muhammad Amanullah 《Journal of Statistical Computation and Simulation》2018,88(13):2473-2488

In fitting regression model, one or more observations may have substantial effects on estimators. These unusual observations are precisely detected by a new diagnostic measure, Pena's statistic. In this article, we introduce a type of Pena's statistic for each point in Liu regression. Using the forecast change property, we simplify the Pena's statistic in a numerical sense. It is found that the simplified Pena's statistic behaves quite well as far as detection of influential observations is concerned. We express Pena's statistic in terms of the Liu leverages and residuals. The normality of this statistic is also discussed and it is demonstrated that it can identify a subset of high Liu leverage outliers. For numerical evaluation, simulated studies are given and a real data set has been analysed for illustration. 相似文献

13.

Location-scale mixed models and goodness-of-fit assessment applied to insect ecology

R. A. Moral J. Hinde E. M. M. Ortega C. G. B. Demtrio W. A. C. Godoy 《Journal of applied statistics》2020,47(10):1776

Survival models have been extensively used to analyse time-until-event data. There is a range of extended models that incorporate different aspects, such as overdispersion/frailty, mixtures, and flexible response functions through semi-parametric models. In this work, we show how a useful tool to assess goodness-of-fit, the half-normal plot of residuals with a simulated envelope, implemented in the hnp package in R, can be used on a location-scale modelling context. We fitted a range of survival models to time-until-event data, where the event was an insect predator attacking a larva in a biological control experiment. We started with the Weibull model and then fitted the exponentiated-Weibull location-scale model with regressors both for the location and scale parameters. We performed variable selection for each model and, by producing half-normal plots with simulated envelopes for the deviance residuals of the model fits, we found that the exponentiated-Weibull fitted the data better. We then included a random effect in the exponentiated-Weibull model to accommodate correlated observations. Finally, we discuss possible implications of the results found in the case study. 相似文献

14.

Feasibility as a mechanism for model identification and validation

Corrine F. Elliott Joshua W. Lambert Arnold J. Stromberg Pei Wang Ting Zeng Katherine L. Thompson 《Journal of applied statistics》2021,48(11):2022

As new technologies permit the generation of hitherto unprecedented volumes of data (e.g. genome-wide association study data), researchers struggle to keep up with the added complexity and time commitment required for its analysis. For this reason, model selection commonly relies on machine learning and data-reduction techniques, which tend to afford models with obscure interpretations. Even in cases with straightforward explanatory variables, the so-called ‘best’ model produced by a given model-selection technique may fail to capture information of vital importance to the domain-specific questions at hand. Herein we propose a new concept for model selection, feasibility, for use in identifying multiple models that are in some sense optimal and may unite to provide a wider range of information relevant to the topic of interest, including (but not limited to) interaction terms. We further provide an R package and associated Shiny Applications for use in identifying or validating feasible models, the performance of which we demonstrate on both simulated and real-life data. 相似文献

15.

A class of residuals for outlier identification in zero adjusted regression models

Gustavo H. A. Pereira Juliana Scudilio Manoel Santos-Neto Denise A. Botter Mnica C. Sandoval 《Journal of applied statistics》2020,47(10):1833

Zero adjusted regression models are used to fit variables that are discrete at zero and continuous at some interval of the positive real numbers. Diagnostic analysis in these models is usually performed using the randomized quantile residual, which is useful for checking the overall adequacy of a zero adjusted regression model. However, it may fail to identify some outliers. In this work, we introduce a class of residuals for outlier identification in zero adjusted regression models. Monte Carlo simulation studies and two applications suggest that one of the residuals of the class introduced here has good properties and detects outliers that are not identified by the randomized quantile residual. 相似文献

16.

The Added Variable Plot for a Time Series of Counts

J.L. Hay & A.N. Pettitt 《Australian & New Zealand Journal of Statistics》1998,40(1):31-42

The issue of modelling non-Gaussian time series data is one that has been examined by several authors in recent years. Zeger (1988) introduced a parameter-driven model for a time series of counts as well as a more general observation-driven model for non-Gaussian data (Zeger & Qaqish, 1988). This paper examines the application of the added variable plot to these two models. This plot is useful for determining the strength of relationships and the detection of influential or outlying observations. 相似文献

17.

DELETE-2 AND DELETE-3 JACKKNIFE PROCEDURES FOR UNMASKING IN REGRESSION

Michael A. Martin Steven Roberts Letian Zheng 《Australian & New Zealand Journal of Statistics》2010,52(1):45-60

Single-case deletion regression diagnostics have been used widely to discover unusual data points, but such approaches can fail in the presence of multiple unusual data points and as a result of masking. We propose a new approach to the use of single-case deletion diagnostics that involves applying these diagnostics to delete-2 and delete-3 jackknife replicates of the data, and considering the percentage of times among these replicates that points are flagged as unusual as an indicator of their influence. By considering replicates that exclude certain collections of points, subtle masking effects can be uncovered. 相似文献

18.

Checking Normality and Homoscedasticity in the General Linear Model Using Diagnostic Plots

A. Schützenmeister U. Jensen 《统计学通讯:模拟与计算》2013,42(2):141-154

Inference for the general linear model makes several assumptions, including independence of errors, normality, and homogeneity of variance. Departure from the latter two of these assumptions may indicate the need for data transformation or removal of outlying observations. Informal procedures such as diagnostic plots of residuals are frequently used to assess the validity of these assumptions or to identify possible outliers. A simulation-based approach is proposed, which facilitates the interpretation of various diagnostic plots by adding simultaneous tolerance bounds. Several tests exist for normality or homoscedasticity in simple random samples. These tests are often applied to residuals from a linear model fit. The resulting procedures are approximate in that correlation among residuals is ignored. The simulation-based approach accounts for the correlation structure of residuals in the linear model and allows simultaneously checking for possible outliers, non normality, and heteroscedasticity, and it does not rely on formal testing.

[Supplementary materials are available for this article. Go to the publisher's online edition of Communications in Statistics—Simulation and Computation® for the following three supplemental resource: a word file containing figures illustrating the mode of operation for the bisectional algorithm, QQ-plots, and a residual plot for the mussels data.] 相似文献

19.

Deviance residuals in generalised log-gamma regression models with censored observations

《Journal of Statistical Computation and Simulation》2012,82(8):747-764

In this article, we compare three residuals based on the deviance component in generalised log-gamma regression models with censored observations. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. For all cases studied, the empirical distributions of the proposed residuals are in general symmetric around zero, but only a martingale-type residual presented negligible kurtosis for the majority of the cases studied. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for the martingale-type residual in generalised log-gamma regression models with censored data. A lifetime data set is analysed under log-gamma regression models and a model checking based on the martingale-type residual is performed. 相似文献

20.

The log-odd log-logistic Weibull regression model: modelling,estimation, influence diagnostics and residual analysis

《Journal of Statistical Computation and Simulation》2012,82(8):1516-1538

In applications of survival analysis, the failure rate function may frequently present a unimodal shape. In such cases, the log-normal and log-logistic distributions are used. In this paper, we shall be concerned only with parametric forms, so a location-scale regression model based on the odd log-logistic Weibull distribution is proposed for modelling data with a decreasing, increasing, unimodal and bathtub failure rate function as an alternative to the log-Weibull regression model. For censored data, we consider a classic method to estimate the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. In addition, the empirical distribution of some modified residuals is determined and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the new regression model applied to censored data. We analyse a real data set using the log-odd log-logistic Weibull regression model. 相似文献