首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Modelling count data with overdispersion and spatial effects   总被引:1,自引:1,他引:0  
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson (GP) distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding correlated spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. in Stat Comput 12(4):353–367, (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. in J R Stat Soc B64(4):583–640, (2002) and using proper scoring rules, see for example Gneiting and Raftery in Technical Report no. 463, University of Washington, (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, spatial Poisson models with spatially correlated or uncorrelated random effects are to be preferred over all other models according to the considered criteria.  相似文献   

2.
Real count data time series often show the phenomenon of the underdispersion and overdispersion. In this paper, we develop two extensions of the first-order integer-valued autoregressive process with Poisson innovations, based on binomial thinning, for modeling integer-valued time series with equidispersion, underdispersion, and overdispersion. The main properties of the models are derived. The methods of conditional maximum likelihood, Yule–Walker, and conditional least squares are used for estimating the parameters, and their asymptotic properties are established. We also use a test based on our processes for checking if the count time series considered is overdispersed or underdispersed. The proposed models are fitted to time series of the weekly number of syphilis cases and monthly counts of family violence illustrating its capabilities in challenging the overdispersed and underdispersed count data.  相似文献   

3.
In this paper we propose a new stationary first‐order non‐negative integer valued autoregressive process with geometric marginals based on a generalised version of the negative binomial thinning operator. In this manner we obtain another process that we refer to as a generalised stationary integer‐valued autoregressive process of the first order with geometric marginals. This new process will enable one to tackle the problem of overdispersion inherent in the analysis of integer‐valued time series data, and contains the new geometric process as a particular case. In addition various properties of the new process, such as conditional distribution, autocorrelation structure and innovation structure, are derived. We discuss conditional maximum likelihood estimation of the model parameters. We evaluate the performance of the conditional maximum likelihood estimators by a Monte Carlo study. The proposed process is fitted to time series of number of weekly sales (economics) and weekly number of syphilis cases (medicine) illustrating its capabilities in challenging cases of highly overdispersed count data.  相似文献   

4.
We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate. The model is fitted repeatedly to subsampled data, and variables with high selection frequencies are extracted. To apply stability selection to boosted GAMLSS, we develop a new “noncyclical” fitting algorithm that incorporates an additional selection step of the best-fitting distribution parameter in each iteration. This new algorithm has the additional advantage that optimizing the tuning parameters of boosting is reduced from a multi-dimensional to a one-dimensional problem with vastly decreased complexity. The performance of the novel algorithm is evaluated in an extensive simulation study. We apply this new algorithm to a study to estimate abundance of common eider in Massachusetts, USA, featuring excess zeros, overdispersion, nonlinearity and spatiotemporal structures. Eider abundance is estimated via boosted GAMLSS, allowing both mean and overdispersion to be regressed on covariates. Stability selection is used to obtain a sparse set of stable predictors.  相似文献   

5.
On the use of corrections for overdispersion   总被引:3,自引:0,他引:3  
In studying fluctuations in the size of a blackgrouse ( Tetrao tetrix ) population, an autoregressive model using climatic conditions appears to follow the change quite well. However, the deviance of the model is considerably larger than its number of degrees of freedom. A widely used statistical rule of thumb holds that overdispersion is present in such situations, but model selection based on a direct likelihood approach can produce opposing results. Two further examples, of binomial and of Poisson data, have models with deviances that are almost twice the degrees of freedom and yet various overdispersion models do not fit better than the standard model for independent data. This can arise because the rule of thumb only considers a point estimate of dispersion, without regard for any measure of its precision. A reasonable criterion for detecting overdispersion is that the deviance be at least twice the number of degrees of freedom, the familiar Akaike information criterion, but the actual presence of overdispersion should then be checked by some appropriate modelling procedure.  相似文献   

6.
In this article, we propose a class of logarithmic autoregressive conditional duration (ACD)-type models that accommodates overdispersion, intermittent dynamics, multiple regimes, and asymmetries in financial durations. In particular, our functional coefficient logarithmic autoregressive conditional duration (FC-LACD) model relies on a smooth-transition autoregressive specification. The motivation lies on the fact that the latter yields a universal approximation if one lets the number of regimes grows without bound. After establishing sufficient conditions for strict stationarity, we address model identifiability as well as the asymptotic properties of the quasi-maximum likelihood (QML) estimator for the FC-LACD model with a fixed number of regimes. In addition, we also discuss how to consistently estimate a semiparametric variant of the FC-LACD model that takes the number of regimes to infinity. An empirical illustration indicates that our functional coefficient model is flexible enough to model IBM price durations.  相似文献   

7.
Estimation in conditional first order autoregression with discrete support   总被引:1,自引:0,他引:1  
We consider estimation in the class of first order conditional linear autoregressive models with discrete support that are routinely used to model time series of counts. Various groups of estimators proposed in the literature are discussed: moment-based estimators; regression-based estimators; and likelihood-based estimators. Some of these have been used previously and others not. In particular, we address the performance of new types of generalized method of moments estimators and propose an exact maximum likelihood procedure valid for a Poisson marginal model using backcasting. The small sample properties of all estimators are comprehensively analyzed using simulation. Three situations are considered using data generated with: a fixed autoregressive parameter and equidispersed Poisson innovations; negative binomial innovations; and, additionally, a random autoregressive coefficient. The first set of experiments indicates that bias correction methods, not hitherto used in this context to our knowledge, are some-times needed and that likelihood-based estimators, as might be expected, perform well. The second two scenarios are representative of overdispersion. Methods designed specifically for the Poisson context now perform uniformly badly, but simple, bias-corrected, Yule-Walker and least squares estimators perform well in all cases.  相似文献   

8.
The time series of counts observed in practice often exhibit overdispersion. The INGARCH(p, q) models are able to describe integer-valued processes with overdispersion. Known properties of these models, however, are nearly exclusively restricted to the special case p = q = 1. In this article, we derive a set of equations from which the variance and the autocorrelation function of the general case can be obtained. We investigate the purely autoregressive INGARCH(p, 0) models and show that they are closely related to the standard AR(p) models. For p = 1, we determine the marginal distribution in terms of its cumulants. A real-data example highlights potential fields of application of the INGARCH(p, 0) models.  相似文献   

9.
This paper develops Bayesian inference of extreme value models with a flexible time-dependent latent structure. The generalized extreme value distribution is utilized to incorporate state variables that follow an autoregressive moving average (ARMA) process with Gumbel-distributed innovations. The time-dependent extreme value distribution is combined with heavy-tailed error terms. An efficient Markov chain Monte Carlo algorithm is proposed using a state-space representation with a finite mixture of normal distributions to approximate the Gumbel distribution. The methodology is illustrated by simulated data and two different sets of real data. Monthly minima of daily returns of stock price index, and monthly maxima of hourly electricity demand are fit to the proposed model and used for model comparison. Estimation results show the usefulness of the proposed model and methodology, and provide evidence that the latent autoregressive process and heavy-tailed errors play an important role to describe the monthly series of minimum stock returns and maximum electricity demand.  相似文献   

10.
In this paper, we propose a new generalized alpha-skew-T (GAST) distribution for generalized autoregressive conditional heteroskedasticity (GARCH) models in modelling daily Value-at-Risk (VaR). Some mathematical properties of the proposed distribution are derived including density function, moments and stochastic representation. The maximum likelihood estimation method is discussed to estimate parameters via a simulation study. Then, the real data application on S&P-500 index is performed to investigate the performance of GARCH models specified under GAST innovation distribution with respect to normal, Student's-t and Skew-T models in terms of the VaR accuracy. Backtesting methodology is used to compare the out-of-sample performance of the VaR models. The results show that GARCH models with GAST innovation distribution outperforms among others and generates the most conservative VaR forecasts for all confidence levels and for both long and short positions.  相似文献   

11.
We propose autoregressive moving average (ARMA) and generalized autoregressive conditional heteroscedastic (GARCH) models driven by asymmetric Laplace (AL) noise. The AL distribution plays, in the geometric-stable class, the analogous role played by the normal in the alpha-stable class, and has shown promise in the modelling of certain types of financial and engineering data. In the case of an ARMA model we derive the marginal distribution of the process, as well as its bivariate distribution when separated by a finite number of lags. The calculation of exact confidence bands for minimum mean-squared error linear predictors is shown to be straightforward. Conditional maximum likelihood-based inference is advocated, and corresponding asymptotic results are discussed. The models are particularly suited for processes that are skewed, peaked, and leptokurtic, but which appear to have some higher order moments. A case study of a fund of real estate returns reveals that AL noise models tend to deliver a superior fit with substantially less parameters than normal noise counterparts, and provide both a competitive fit and a greater degree of numerical stability with respect to other skewed distributions.  相似文献   

12.
The generalized Poisson (GP) regression is an increasingly popular approach for modeling overdispersed as well as underdispersed count data. Several parameterizations have been performed for the GP regression, and the two well known models, the GP-1 and the GP-2, have been applied. The GP-P regression, which has been recently proposed, has the advantage of nesting the GP-1 and the GP-2 parametrically, besides allowing the statistical tests of the GP-1 and the GP-2 against a more general alternative. In several cases, count data often have excessive number of zero outcomes than are expected in the Poisson. This zero-inflation phenomenon is a specific cause of overdispersion, and the zero-inflated Poisson (ZIP) regression model has been proposed. However, if the data continue to suggest additional overdispersion, the zero-inflated negative binomial (ZINB-1 and ZINB-2) and the zero-inflated generalized Poisson (ZIGP-1 and ZIGP-2) regression models have been considered as alternatives. This article proposes a functional form of the ZIGP which mixes a distribution degenerate at zero with a GP-P distribution. The suggested model has the advantage of nesting the ZIP and the two well known ZIGP (ZIGP-1 and ZIGP-2) regression models, besides allowing the statistical tests of the ZIGP-1 and the ZIGP-2 against a more general alternative. The ZIP and the functional form of the ZIGP regression models are fitted, compared and tested on two sets of count data; the Malaysian insurance claim data and the German healthcare data.  相似文献   

13.
Non-Gaussian outcomes are often modeled using members of the so-called exponential family. The Poisson model for count data falls within this tradition. The family in general, and the Poisson model in particular, are at the same time convenient since mathematically elegant, but in need of extension since often somewhat restrictive. Two of the main rationales for existing extensions are (1) the occurrence of overdispersion, in the sense that the variability in the data is not adequately captured by the model's prescribed mean-variance link, and (2) the accommodation of data hierarchies owing to, for example, repeatedly measuring the outcome on the same subject, recording information from various members of the same family, etc. There is a variety of overdispersion models for count data, such as, for example, the negative-binomial model. Hierarchies are often accommodated through the inclusion of subject-specific, random effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these issues may occur simultaneously, models accommodating them at once are less than common. This paper proposes a generalized linear model, accommodating overdispersion and clustering through two separate sets of random effects, of gamma and normal type, respectively. This is in line with the proposal by Booth et al. (Stat Model 3:179-181, 2003). The model extends both classical overdispersion models for count data (Breslow, Appl Stat 33:38-44, 1984), in particular the negative binomial model, as well as the generalized linear mixed model (Breslow and Clayton, J Am Stat Assoc 88:9-25, 1993). Apart from model formulation, we briefly discuss several estimation options, and then settle for maximum likelihood estimation with both fully analytic integration as well as hybrid between analytic and numerical integration. The latter is implemented in the SAS procedure NLMIXED. The methodology is applied to data from a study in epileptic seizures.  相似文献   

14.
Overdispersion is a common phenomenon in Poisson modeling. The generalized Poisson (GP) regression model accommodates both overdispersion and underdispersion in count data modeling, and is an increasingly popular platform for modeling overdispersed count data. The Poisson model is one of the special cases in the collection of models which may be specified by GP regression. Thus, we may derive a test of overdispersion which compares the equi-dispersion Poisson model within the context of the more general GP regression model. The score test has an advantage over the likelihood ratio test (LRT) and over the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis (the Poisson model). Herein, we propose a score test for overdispersion based on the GP model (specifically the GP-2 model) and compare the power of the test with the LRT and Wald tests. A simulation study indicates the proposed score test based on asymptotic standard normal distribution is more appropriate in practical applications.  相似文献   

15.
In this paper, we introduce a new first-order generalized Poisson integer-valued autoregressive process, for modeling integer-valued time series exhibiting a piecewise structure and overdispersion. Basic probabilistic and statistical properties of this model are discussed. Conditional least squares and conditional maximum likelihood estimators are derived. The asymptotic properties of the estimators are established. Moreover, two special cases of the process are discussed. Finally, some numerical results of the estimates and a real data example are presented.  相似文献   

16.
ABSTRACT

This paper derives models to analyse Cannabis offences count series from New South Wales, Australia. The data display substantial overdispersion as well as underdispersion for a subset, trend movement and population heterogeneity. To describe the trend dynamic in the data, the Poisson geometric process model is first adopted and is extended to the generalized Poisson geometric process model to capture both over- and underdispersion. By further incorporating mixture effect, the model accommodates population heterogeneity and enables classification of homogeneous units. The model is implemented using Markov chain Monte Carlo algorithms via the user-friendly WinBUGS software and its performance is evaluated through a simulation study.  相似文献   

17.
We provide methods to robustly estimate the parameters of stationary ergodic short-memory time series models in the potential presence of additive low-frequency contamination. The types of contamination covered include level shifts (changes in mean) and monotone or smooth time trends, both of which have been shown to bias parameter estimates toward regions of persistence in a variety of contexts. The estimators presented here minimize trimmed frequency domain quasi-maximum likelihood (FDQML) objective functions without requiring specification of the low-frequency contaminating component. When proper sample size-dependent trimmings are used, the FDQML estimators are consistent and asymptotically normal, asymptotically eliminating the presence of any spurious persistence. These asymptotic results also hold in the absence of additive low-frequency contamination, enabling the practitioner to robustly estimate model parameters without prior knowledge of whether contamination is present. Popular time series models that fit into the framework of this article include autoregressive moving average (ARMA), stochastic volatility, generalized autoregressive conditional heteroscedasticity (GARCH), and autoregressive conditional heteroscedasticity (ARCH) models. We explore the finite sample properties of the trimmed FDQML estimators of the parameters of some of these models, providing practical guidance on trimming choice. Empirical estimation results suggest that a large portion of the apparent persistence in certain volatility time series may indeed be spurious. Supplementary materials for this article are available online.  相似文献   

18.
The zero-inflated negative binomial (ZINB) model is used to account for commonly occurring overdispersion detected in data that are initially analyzed under the zero-inflated Poisson (ZIP) model. Tests for overdispersion (Wald test, likelihood ratio test [LRT], and score test) based on ZINB model for use in ZIP regression models have been developed. Due to similarity to the ZINB model, we consider the zero-inflated generalized Poisson (ZIGP) model as an alternate model for overdispersed zero-inflated count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes score tests for overdispersion based on the ZIGP model and illustrates that the derived score statistics are exactly the same as the score statistics under the ZINB model. A simulation study indicates the proposed score statistics are preferred to other tests for higher empirical power. In practice, based on the approximate mean–variance relationship in the data, the ZINB or ZIGP model can be considered, and a formal score test based on asymptotic standard normal distribution can be employed for assessing overdispersion in the ZIP model. We provide an example to illustrate the procedures for data analysis.  相似文献   

19.
Use of nonlinear models in analyzing time series data is becoming increasingly popular. This paper considers a broad class of nonlinear autoregressive models where the autoregressive part is additive and the terms are nonlinear functions of the past data. Also, the innovation distribution is supported on the non-negative reals and satisfies a tail regularity condition. The linear parameters of the autoregression are estimated using a linear programming recipe which yields much more accurate estimates than traditional methods such as conditional least squares. Limiting distribution of the linear programming estimators is obtained. Simulation studies validate the asymptotic results and reveal excellent small sample properties of the LPE estimator.  相似文献   

20.
Most regression problems in practice require flexible semiparametric forms of the predictor for modelling the dependence of responses on covariates. Moreover, it is often necessary to add random effects accounting for overdispersion caused by unobserved heterogeneity or for correlation in longitudinal or spatial data. We present a unified approach for Bayesian inference via Markov chain Monte Carlo simulation in generalized additive and semiparametric mixed models. Different types of covariates, such as the usual covariates with fixed effects, metrical covariates with non-linear effects, unstructured random effects, trend and seasonal components in longitudinal data and spatial covariates, are all treated within the same general framework by assigning appropriate Markov random field priors with different forms and degrees of smoothness. We applied the approach in several case-studies and consulting cases, showing that the methods are also computationally feasible in problems with many covariates and large data sets. In this paper, we choose two typical applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号