期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Residual analysis for spatial point processes (with discussion)

A. Baddeley R. Turner J. Møller M. Hazelton 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(5):617-666

Summary. We define residuals for point process models fitted to spatial point pattern data, and we propose diagnostic plots based on them. The residuals apply to any point process model that has a conditional intensity; the model may exhibit spatial heterogeneity, interpoint interaction and dependence on spatial covariates. Some existing ad hoc methods for model checking (quadrat counts, scan statistic, kernel smoothed intensity and Berman's diagnostic) are recovered as special cases. Diagnostic tools are developed systematically, by using an analogy between our spatial residuals and the usual residuals for (non-spatial) generalized linear models. The conditional intensity λ plays the role of the mean response. This makes it possible to adapt existing knowledge about model validation for generalized linear models to the spatial point process context, giving recommendations for diagnostic plots. A plot of smoothed residuals against spatial location, or against a spatial covariate, is effective in diagnosing spatial trend or co-variate effects. Q – Q -plots of the residuals are effective in diagnosing interpoint interaction. 相似文献

2.

Diagnostic checks for integer-valued autoregressive models using expected residuals

Yousung Park Hee-Young Kim 《Statistical Papers》2012,53(4):951-970

Integer-valued time series models make use of thinning operators for coherency in the nature of count data. However, the thinning operators make residuals unobservable and are the main difficulty in developing diagnostic tools for autocorrelated count data. In this regard, we introduce a new residual, which takes the form of predictive distribution functions, to assess probabilistic forecasts, and this new residual is supplemented by a modified usual residuals. Under integer-valued autoregressive (INAR) models, the properties of these two residuals are investigated and used to evaluate the predictive performance and model adequacy of the INAR models. We compare our residuals with the existing residuals through simulation studies and apply our method to select an appropriate INAR model for an over-dispersed real data. 相似文献

3.

Spatial–temporal model for wind speed in Lithuania

Jūratė Šaltytė Benth Laura Šaltytė 《Journal of applied statistics》2011,38(6):1151-1168

In this paper, we propose a spatial–temporal model for the wind speed (WS). We first estimate the model at the single spatial meteorological station independently on spatial correlations. The temporal model contains seasonality, a higher-order autoregressive component and a variance describing the remaining heteroskedesticity in residuals. We then model spatial dependencies by a Gaussian random field. The model is estimated on daily WS records from 18 meteorological stations in Lithuania. The validation procedure based on out-of-sample observations shows that the proposed model is reliable and can be used for various practical applications. 相似文献

4.

Diagnostic Measures for Generalized Linear Models with Missing Covariates

HONGTU ZHU JOSEPH G. IBRAHIM XIAOYAN SHI 《Scandinavian Journal of Statistics》2009,36(4):686-712

Abstract. In this paper, we carry out an in-depth investigation of diagnostic measures for assessing the influence of observations and model misspecification in the presence of missing covariate data for generalized linear models. Our diagnostic measures include case-deletion measures and conditional residuals. We use the conditional residuals to construct goodness-of-fit statistics for testing possible misspecifications in model assumptions, including the sampling distribution. We develop specific strategies for incorporating missing data into goodness-of-fit statistics in order to increase the power of detecting model misspecification. A resampling method is proposed to approximate the p -value of the goodness-of-fit statistics. Simulation studies are conducted to evaluate our methods and a real data set is analysed to illustrate the use of our various diagnostic measures. 相似文献

5.

An evaluation of bootstrap methods for outlier detection in least squares regression

Michael A. Martin Steven Roberts 《Journal of applied statistics》2006,33(7):703-720

Outlier detection is a critical part of data analysis, and the use of Studentized residuals from regression models fit using least squares is a very common approach to identifying discordant observations in linear regression problems. In this paper we propose a bootstrap approach to constructing critical points for use in outlier detection in the context of least-squares Studentized residuals, and find that this approach allows naturally for mild departures in model assumptions such as non-Normal error distributions. We illustrate our methodology through both a real data example and simulated data. 相似文献

6.

Diagnostic checks for discrete data regression models using posterior predictive simulations 总被引：3，自引：0，他引：3

A. Gelman Y. Goegebeur F. Tuerlinckx & I. Van Mechelen 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(2):247-268

Model checking with discrete data regressions can be difficult because the usual methods such as residual plots have complicated reference distributions that depend on the parameters in the model. Posterior predictive checks have been proposed as a Bayesian way to average the results of goodness-of-fit tests in the presence of uncertainty in estimation of the parameters. We try this approach using a variety of discrepancy variables for generalized linear models fitted to a historical data set on behavioural learning. We then discuss the general applicability of our findings in the context of a recent applied example on which we have worked. We find that the following discrepancy variables work well, in the sense of being easy to interpret and sensitive to important model failures: structured displays of the entire data set, general discrepancy variables based on plots of binned or smoothed residuals versus predictors and specific discrepancy variables created on the basis of the particular concerns arising in an application. Plots of binned residuals are especially easy to use because their predictive distributions under the model are sufficiently simple that model checks can often be made implicitly. The following discrepancy variables did not work well: scatterplots of latent residuals defined from an underlying continuous model and quantile–quantile plots of these residuals. 相似文献

7.

Multilevel modelling of the geographical distributions of diseases 总被引：4，自引：0，他引：4

Langford IH Leyland AH Rasbash J Goldstein H 《Journal of the Royal Statistical Society. Series C, Applied statistics》1999,48(2):253-268

Multilevel modelling is used on problems arising from the analysis of spatially distributed health data. We use three applications to demonstrate the use of multilevel modelling in this area. The first concerns small area all-cause mortality rates from Glasgow where spatial autocorrelation between residuals is examined. The second analysis is of prostate cancer cases in Scottish counties where we use a range of models to examine whether the incidence is higher in more rural areas. The third develops a multiple-cause model in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model. We discuss some of the issues surrounding the use of complex spatial models and the potential for future developments. 相似文献

8.

非正态分布下具有自回归误差项的空间自回归模型变量选择研究

王周伟陶志鹏张元庆《统计与信息论坛》2016,(11):27-32

将变量选择引入空间计量模型,讨论具有自回归误差项的空间自回归模型的变量选择问题。在残差非正态独立同分布的条件下,通过最大化信息熵,提出空间信息准则,并证明其在该模型变量选择中具有一致性。模拟研究结果表明:无论对单个系数还是对全部系数,空间信息准则都能很好识别,且与经典的赤池准则相比具有较大的优势。因此,空间信息准则是一种更为有效的变量选择方法。相似文献

9.

Spatial–temporal interpolation of non methane hydrocarbon levels in Kuwait

S. A. Alawadhi F. A. Alawadhi 《统计学通讯:理论与方法》2017,46(6):2764-2779

This article handles the prediction of hourly concentrations ofnon methane hydrocarbon (NMHC) pollutants at 15 unmonitored sites in Kuwait using the data recorded from 6 monitored stations at successive time points. The trend model depends on hourly meteorological variables and seasonal effects. The stochasticcomponent of the trend model which has spatiotemporal features is modeled as autoregressive temporal process. A spatial predictive distribution for residuals of the AR model is developed for the unmonitored sites. By transforming the predicted residuals back to the original data scales, we impute Kuwait’s hourly NMHC field. 相似文献

10.

Identification of local clusters for count data: a model-based Moran's I test

Tonglin Zhang Ge Lin 《Journal of applied statistics》2008,35(3):293-306

We set out I_DR as a loglinear-model-based Moran's I test for Poisson count data that resembles the Moran's I residual test for Gaussian data. We evaluate its type I and type II error probabilities via simulations, and demonstrate its utility via a case study. When population sizes are heterogeneous, I_DR is effective in detecting local clusters by local association terms with an acceptable type I error probability. When used in conjunction with local spatial association terms in loglinear models, I_DR can also indicate the existence of first-order global cluster that can hardly be removed by local spatial association terms. In this situation, I_DR should not be directly applied for local cluster detection. In the case study of St. Louis homicides, we bridge loglinear model methods for parameter estimation to exploratory data analysis, so that a uniform association term can be defined with spatially varied contributions among spatial neighbors. The method makes use of exploratory tools such as Moran's I scatter plots and residual plots to evaluate the magnitude of deviance residuals, and it is effective to model the shape, the elevation and the magnitude of a local cluster in the model-based test. 相似文献

11.

空间回归模型选择的反思 总被引：1，自引：0，他引：1

姜磊《统计与信息论坛》2016,(10):10-16

空间计量经济学存在两种最基本的模型:空间滞后模型和空间误差模型,这里旨在重新思考和探讨这两种空间回归模型的选择,结论为:Moran’s I指数可以用来判断回归模型后的残差是否存在空间依赖性;在实证分析中,采用拉格朗日乘子检验判断两种模型优劣是最常见的做法。然而,该检验仅仅是基于统计推断而忽略了理论基础,因此,可能导致选择错误的模型;在实证分析中,空间误差模型经常被选择性遗忘,而该模型的适用性较空间滞后模型更为广泛;实证分析大多缺乏空间回归模型设定的探讨,Anselin提出三个统计量,并且,如果模型设定正确,应该遵从Wald统计量>Log likelihood统计量>LM统计量的排列顺序。相似文献

12.

空间经济计量滞后模型Bootstrap Moran检验功效的模拟分析

欧变玲龙志和林光平《统计研究》2010,27(9):91-96

当误差项不服从独立同分布时,利用Moran’s I统计量的渐近检验,无法有效判断空间经济计量滞后模型2SLS估计残差间存在空间关系与否。本文采用两种基于残差的Bootstrap方法,诊断空间经济计量滞后模型残差中的空间相关关系。大量Monte Carlo模拟结果显示,从功效角度看,无论误差项服从独立同分布与否,与渐近检验相比,Bootstrap Moran检验都具有更好的有限样本性质,能够更有效地进行空间相关性检验。尤其是,在样本量较小和空间衔接密度较高情况下,Bootstrap Moran检验的功效显著大于渐近检验。相似文献

13.

Generalized linear spatial models in epidemiology: A case study of zoonotic cutaneous leishmaniasis in Tunisia

K. Ben-Ahmed A. Bouratbine 《Journal of applied statistics》2010,37(1):159-170

Generalized linear spatial models (GLSM) are used here to study spatial characters of zoonotic cutaneous leishmaniasis (ZCL) in Tunisia. The response variable stands for the number of affected by district during the period 2001–2002. The model covariates are: climates (temperature and rainfall), humidity and surrounding vegetation status. As the environmental and weather data are not available for all the studied districts, Kriging based on linear interpolation was used to estimate the missing data. To account for unexplained spatial variation in the model, we include a stationary Gaussian process S with a powered exponential spatial correlation function. Moran coefficient, DIC criterion and residuals variograms are used to show the high goodness-of-fit of the GLSM. When compared with the statistical tools used in the previous ZCL studies, the optimal GLSM found here yields a better assessment of the impact of the risk factors, a better prediction of ZCL evolution and a better comprehension of the disease transmission. The statistical results show the progressive increase in the number of affected in zones with high temperature, low rainfall and high surrounding vegetation index. Relative humidity does not seem to affect the distribution of the disease in Tunisia. The results of the statistical analyses stress the important risk of misleading epidemiological conclusions when non-spatial models are used to analyse spatially structured data. 相似文献

14.

Avoiding 'data snooping' in multilevel and mixed effects models

David Afshartous Michael Wolf 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(4):1035-1059

Summary. Multilevel or mixed effects models are commonly applied to hierarchical data. The level 2 residuals, which are otherwise known as random effects, are often of both substantive and diagnostic interest. Substantively, they are frequently used for institutional comparisons or rankings. Diagnostically, they are used to assess the model assumptions at the group level. Inference on the level 2 residuals, however, typically does not account for 'data snooping', i.e. for the harmful effects of carrying out a multitude of hypothesis tests at the same time. We provide a very general framework that encompasses both of the following inference problems: inference on the 'absolute' level 2 residuals to determine which are significantly different from 0, and inference on any prespecified number of pairwise comparisons. Thus, the user has the choice of testing the comparisons of interest. As our methods are flexible with respect to the estimation method that is invoked, the user may choose the desired estimation method accordingly. We demonstrate the methods with the London education authority data, the wafer data and the National Educational Longitudinal Study data. 相似文献

15.

Residual-based specification of the random-effects distribution for cluster data

Samuel Soubeyrand Joël Chad&#x;uf Ivan Sache Christian Lannou 《Statistical Methodology》2006,3(4):464-482

We propose a method for specifying the distribution of random effects included in a model for cluster data. The class of models we consider includes mixed models and frailty models whose random effects and explanatory variables are constant within clusters. The method is based on cluster residuals obtained by assuming that the random effects are equal between clusters. We exhibit an asymptotic relationship between the cluster residuals and variations of the random effects as the number of observations increases and the variance of the random effects decreases. The asymptotic relationship is used to specify the random-effects distribution. The method is applied to a frailty model and a model used to describe the spread of plant diseases. 相似文献

16.

Non parametric space-time modeling of SO2 in presence of many missing data

Bruno Scarpa 《Statistical Methods and Applications》2005,14(1):67-82

Given pollution measurement from a network of monitoring sites in the area of a city and over an extended period of time, an important problem is to identify the spatial and temporal structure of the data. In this paper we focus on the identification and estimate of a statistical non parametric model to analyse the SO₂ in the city of Padua, where data are collected by some fixed stations and some mobile stations moving without any specific rule in different new locations. The impact of the use of mobile stations is that for each location there are times when data was not collected. Assuming temporal stationarity and spatial isotropy for the residuals of an additive model for the logarithm of SO₂ concentration, we estimate the semivariogram using a kernel-type estimator. Attempts are made to avoid the assumption of spatial isotropy. Bootstrap confidence bands are obtained for the spatial component of the additive model that is a deterministic function which defines the spatial structure. Finally, an example is proposed to design an optimal network for the mobiles monitoring stations in a fixed future time, given all the information available. 相似文献

17.

The “wrong skewness” problem in stochastic frontier models: A new approach

Christian M. Hafner Hans Manner Léopold Simar 《Econometric Reviews》2018,37(4):380-400

Stochastic frontier models are widely used to measure, e.g., technical efficiencies of firms. The classical stochastic frontier model often suffers from the empirical artefact that the residuals of the production function may have a positive skewness, whereas a negative one is expected under the model, which leads to estimated full efficiencies of all firms. We propose a new approach to the problem by generalizing the distribution used for the inefficiency variable. This generalized stochastic frontier model allows the sample data to have the wrong skewness while estimating well-defined and nondegenerate efficiency measures. We discuss the statistical properties of the model, and we discuss a test for the symmetry of the error term (no inefficiency). We provide a simulation study to show that our model delivers estimators of efficiency with smaller bias than those of the classical model even if the population skewness has the correct sign. Finally, we apply the model to data of the U.S. textile industry for 1958–2005 and show that for a number of years our model suggests technical efficiencies well below the frontier while the classical one estimates no inefficiency in those years. 相似文献

18.

Residuals in the Extended Growth Curve Model

JEMILA SEID HAMID DIETRICH VON ROSEN 《Scandinavian Journal of Statistics》2006,33(1):121-138

Abstract. The Extended Growth Curve model is considered. It turns out that the estimated mean of the model is the projection of the observations on the space generated by the design matrices which turns out to be the sum of two tensor product spaces. The orthogonal complement of this space is decomposed into four orthogonal spaces and residuals are defined by projecting the observation matrix on the resulting components. The residuals are interpreted and some remarks are given as to why we should not use ordinary residuals, what kind of information our residuals give and how this information might be used to validate model assumptions and detect outliers and influential observations. It is shown that the residuals are symmetrically distributed around zero and are uncorrelated with each other. The covariance between the residuals and the estimated model as well as the dispersion matrices for the residuals are also given. 相似文献

19.

A log-linear regression model for the odd Weibull distribution with censored data

Edwin M.M. Ortega Gauss M. Cordeiro Elizabeth M. Hashimoto Kahadawala Cooray 《Journal of applied statistics》2014,41(9):1859-1880

We introduce the log-odd Weibull regression model based on the odd Weibull distribution (Cooray, 2006). We derive some mathematical properties of the log-transformed distribution. The new regression model represents a parametric family of models that includes as sub-models some widely known regression models that can be applied to censored survival data. We employ a frequentist analysis and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some ways to assess global influence. Further, for different parameter settings, sample sizes and censoring percentages, some simulations are performed. In addition, the empirical distribution of some modified residuals are given and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to check the model assumptions. The extended regression model is very useful for the analysis of real data. 相似文献

20.

Monitoring the parameter changes in general ARIMA time series models

Yuzhi Cai Neville Davies 《Journal of applied statistics》2003,30(9):983-1001

We propose methods for monitoring the residuals of a fitted ARIMA or an autoregressive fractionally integrated moving average (ARFIMA) model in order to detect changes of the parameters in that model. We extend the procedures of Box & Ramirez (1992) and Ramirez (1992) and allow the differencing parameter, d to be fractional or integer. Test statistics are approximated by Wiener processes. We carry out simulations and also apply our method to several real time series. The results show that our method is effective for monitoring all parameters in ARFIMA models. 相似文献