首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Real count data time series often show the phenomenon of the underdispersion and overdispersion. In this paper, we develop two extensions of the first-order integer-valued autoregressive process with Poisson innovations, based on binomial thinning, for modeling integer-valued time series with equidispersion, underdispersion, and overdispersion. The main properties of the models are derived. The methods of conditional maximum likelihood, Yule–Walker, and conditional least squares are used for estimating the parameters, and their asymptotic properties are established. We also use a test based on our processes for checking if the count time series considered is overdispersed or underdispersed. The proposed models are fitted to time series of the weekly number of syphilis cases and monthly counts of family violence illustrating its capabilities in challenging the overdispersed and underdispersed count data.  相似文献   

2.
The Poisson distribution is a benchmark for modeling count data. Its equidispersion constraint, however, does not accurately represent real data. Most real datasets express overdispersion; hence attention in the statistics community focuses on associated issues. More examples are surfacing, however, that display underdispersion, warranting the need to highlight this phenomenon and bring more attention to those models that can better describe such data structures. This work addresses various sources of data underdispersion and surveys several distributions that can model underdispersed data, comparing their performance on applied datasets.  相似文献   

3.
Summary.  Data comprising colony counts, or a binary variable representing fertile (or sterile) samples, as a dilution series of the containing medium are analysed by using extended Poisson process modelling. These models form a class of flexible probability distributions that are widely applicable to count and grouped binary data. Standard distributions such as Poisson and binomial, and those representing overdispersion and underdispersion relative to these distributions can be expressed within this class. For all the models in the class, likelihoods can be obtained. These models have not been widely used because of the perceived difficulty of performing the calculations and the lack of associated software. Exact calculation of the probabilities that are involved can be time consuming although accurate approximations that use considerably less computational time are available. Although dilution series data are the focus here, the models are applicable to any count or binary data. A benefit of the approach is the ability to draw likelihood-based inferences from the data.  相似文献   

4.
In this paper, we establish several connections of the Poisson weight function to overdispersion and underdispersion. Specifically, we establish that the logconvexity (logconcavity) of the mean weight function is a necessary and sufficient condition for overdispersion (underdispersion) when the Poisson weight function does not depend on the original Poisson parameter. We also discuss some properties of the weighted Poisson distributions (WPD). We then introduce a notion of pointwise duality between two WPDs and discuss some associated properties. Next, we present some illustrative examples and provide a discussion on various Poisson weight functions used in practice. Finally, some concluding remarks are made.  相似文献   

5.
Event counts are response variables with non-negative integer values representing the number of times that an event occurs within a fixed domain such as a time interval, a geographical area or a cell of a contingency table. Analysis of counts by Gaussian regression models ignores the discreteness, asymmetry and heteroscedasticity and is inefficient, providing unrealistic standard errors or possibly negative predictions of the expected number of events. The Poisson regression is the standard model for count data with underlying assumptions on the generating process which may be implausible in many applications. Statisticians have long recognized the limitation of imposing equidispersion under the Poisson regression model. A typical situation is when the conditional variance exceeds the conditional mean, in which case models allowing for overdispersion are routinely used. Less reported is the case of underdispersion with fewer modeling alternatives and assessments available in the literature. One of such alternatives, the Gamma-count model, is adopted here in the analysis of an agronomic experiment designed to investigate the effect of levels of defoliation on different phenological states upon the number of cotton bolls. Data set and code for analysis are available as online supplements. Results show improvements over the Poisson model and the semi-parametric quasi-Poisson model in capturing the observed variability in the data. Estimating rather than assuming the underlying variance process leads to important insights into the process.  相似文献   

6.
Count data analysis techniques have been developed in biological and medical research areas. In particular, zero-inflated versions of parametric count distributions have been used to model excessive zeros that are often present in these assays. The most common count distributions for analyzing such data are Poisson and negative binomial. However, a Poisson distribution can only handle equidispersed data and a negative binomial distribution can only cope with overdispersion. However, a Conway–Maxwell–Poisson (CMP) distribution [4] can handle a wide range of dispersion. We show, with an illustrative data set on next-generation sequencing of maize hybrids, that both underdispersion and overdispersion can be present in genomic data. Furthermore, the maize data set consists of clustered observations and, therefore, we develop inference procedures for a zero-inflated CMP regression that incorporates a cluster-specific random effect term. Unlike the Gaussian models, the underlying likelihood is computationally challenging. We use a numerical approximation via a Gaussian quadrature to circumvent this issue. A test for checking zero-inflation has also been developed in our setting. Finite sample properties of our estimators and test have been investigated by extensive simulations. Finally, the statistical methodology has been applied to analyze the maize data mentioned before.  相似文献   

7.
Overdispersion is a common phenomenon in Poisson modeling. The generalized Poisson (GP) regression model accommodates both overdispersion and underdispersion in count data modeling, and is an increasingly popular platform for modeling overdispersed count data. The Poisson model is one of the special cases in the collection of models which may be specified by GP regression. Thus, we may derive a test of overdispersion which compares the equi-dispersion Poisson model within the context of the more general GP regression model. The score test has an advantage over the likelihood ratio test (LRT) and over the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis (the Poisson model). Herein, we propose a score test for overdispersion based on the GP model (specifically the GP-2 model) and compare the power of the test with the LRT and Wald tests. A simulation study indicates the proposed score test based on asymptotic standard normal distribution is more appropriate in practical applications.  相似文献   

8.
We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion.  相似文献   

9.
We consider a generalization of a standard test for overdispersion (underdispersion) of possibly Poison data. Under the null hypothesis observed counts are increments of Poisson processes. Particular applications are toa random sample of identically distributed processes and a single observed process. The test has intuitive appeal beyond the specific alternatives considered.  相似文献   

10.
The Poisson distribution is a simple and popular model for count-data random variables, but it suffers from the equidispersion requirement, which is often not met in practice. While models for overdispersed counts have been discussed intensively in the literature, the opposite phenomenon, underdispersion, has received only little attention, especially in a time series context. We start with a detailed survey of distribution models allowing for underdispersion, discuss their properties and highlight possible disadvantages. After having identified two model families with attractive properties as well as only two model parameters, we combine these models with the INAR(1) model (integer-valued autoregressive), which is particularly well suited to obtain auotocorrelated counts with underdispersion. Properties of the resulting stationary INAR(1) models and approaches for parameter estimation are considered, as well as possible extensions to higher order autoregressions. Three real-data examples illustrate the application of the models in practice.  相似文献   

11.
Alberto Luceño 《Statistics》2013,47(3):261-267
This article analyses the broad family of discrete probability distributions generated by relating Prob (y) to Prob (y?1), …, Prob (y?n), for some n≥1, through a recursive equation. This family contains the binomial, negative binomial and Poisson distributions as well as the Katz family of distributions. In addition, the suggested family contains some convolutions of Poisson distributions and other generalized distributions, which provide models for Poisson overdispersion or underdispersion.  相似文献   

12.
The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code.  相似文献   

13.
The complex triparametric Pearson (CTP) distribution is a flexible model belonging to the Gaussian hypergeometric family that can account for over- and underdispersion. However, despite its good properties, not much attention has been paid to it. So, we revive the CTP comparing it with some well-known distributions that cope with overdispersion (negative binomial, generalized Poisson and univariate generalized Waring) as well as underdispersion (Conway–Maxwell–Poisson (CMP) and hyper-Poisson (HP)). We make a simulation study that reveals the performance of the CTP and shows that it has its own space among count data models. In this sense, we also explore some overdispersed datasets which seem to be more appropriately modelled by the CTP than by other usual models. Moreover, we include two underdispersed examples to illustrate that the CTP can provide similar fits to the CMP or HP (sometimes even more accurate) without the computational problems of these models.  相似文献   

14.
A new distribution for non-negative integers, or counts, is developed. It is based on the assumption that the waiting times separating consecutive events are independently and identically gamma distributed. Thus, the structural process generating the counts may exhibit duration dependence. In this framework, the frequently observed phenomenon of overdispersion, that is a variance that exceeds the mean, is caused by a decreasing hazard function of the gamma distributed waiting times, while an increasing hazard leads to underdispersion at the level of the counts. A Monte Carlo simulation and an application to fertility data illustrate the performance of the new distribution.  相似文献   

15.
Summary.  A useful discrete distribution (the Conway–Maxwell–Poisson distribution) is revived and its statistical and probabilistic properties are introduced and explored. This distribution is a two-parameter extension of the Poisson distribution that generalizes some well-known discrete distributions (Poisson, Bernoulli and geometric). It also leads to the generalization of distributions derived from these discrete distributions (i.e. the binomial and negative binomial distributions). We describe three methods for estimating the parameters of the Conway–Maxwell–Poisson distribution. The first is a fast simple weighted least squares method, which leads to estimates that are sufficiently accurate for practical purposes. The second method, using maximum likelihood, can be used to refine the initial estimates. This method requires iterations and is more computationally intensive. The third estimation method is Bayesian. Using the conjugate prior, the posterior density of the parameters of the Conway–Maxwell–Poisson distribution is easily computed. It is a flexible distribution that can account for overdispersion or underdispersion that is commonly encountered in count data. We also explore two sets of real world data demonstrating the flexibility and elegance of the Conway–Maxwell–Poisson distribution in fitting count data which do not seem to follow the Poisson distribution.  相似文献   

16.
The generalized Charlier series distribution includes the binomial distribution, and the noncentral negative binomial distribution extends the negative binomial distribution. The present article proposes a family of counting distributions, which contains both the generalized Charlier series and extended noncentral negative binomial distributions. Compound and mixture formulations of the proposed distribution are given. The probability mass function is expressible in terms of the confluent hypergeometric function as well as the Gauss hypergeometric function. Recursive formulae for probability mass function have been studied by Panjer, Sundt and Jewell, Schröter, Sundt, and Kitano et al. in the context of insurance risk. This article explores horizontal, vertical, triangular, and diagonal recursions. Recursive formulae as well as exact expressions for descending factorial moments are studied. The proposed distribution allows overdispersion or underdispersion relative to a Poisson distribution. An illustrative example of data fitting is given.  相似文献   

17.
ABSTRACT

This paper derives models to analyse Cannabis offences count series from New South Wales, Australia. The data display substantial overdispersion as well as underdispersion for a subset, trend movement and population heterogeneity. To describe the trend dynamic in the data, the Poisson geometric process model is first adopted and is extended to the generalized Poisson geometric process model to capture both over- and underdispersion. By further incorporating mixture effect, the model accommodates population heterogeneity and enables classification of homogeneous units. The model is implemented using Markov chain Monte Carlo algorithms via the user-friendly WinBUGS software and its performance is evaluated through a simulation study.  相似文献   

18.
Modelling count data with overdispersion and spatial effects   总被引:1,自引:1,他引:0  
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson (GP) distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding correlated spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. in Stat Comput 12(4):353–367, (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. in J R Stat Soc B64(4):583–640, (2002) and using proper scoring rules, see for example Gneiting and Raftery in Technical Report no. 463, University of Washington, (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, spatial Poisson models with spatially correlated or uncorrelated random effects are to be preferred over all other models according to the considered criteria.  相似文献   

19.
Global regression assumes that a single model adequately describes all parts of a study region. However, the heterogeneity in the data may be sufficiently strong that relationships between variables can not be spatially constant. In addition, the factors involved are often sufficiently complex that it is difficult to identify them in the form of explanatory variables. As a result Geographically Weighted Regression (GWR) was introduced as a tool for the modeling of non-stationary spatial data. Using kernel functions, the GWR methodology allows the model parameters to vary spatially and produces non-parametric surfaces of their estimates. To model count data with overdispersion, it is more appropriate to use a negative binomial distribution instead of a Poisson distribution. Therefore, we propose the Geographically Weighted Negative Binomial Regression (GWNBR) method for the modeling of data with overdispersion. The results obtained using simulated and real data show the superiority of this method for the modeling of non-stationary count data with overdispersion compared with competing models, such as global regressions, e.g., Poisson and negative binomial and Geographically Weighted Poisson Regression (GWPR). Moreover, we illustrate that these competing models are special cases of the more robust model GWNBR.  相似文献   

20.
In modeling defect counts collected from an established manufacturing processes, there are usually a relatively large number of zeros (non-defects). The commonly used models such as Poisson or Geometric distributions can underestimate the zero-defect probability and hence make it difficult to identify significant covariate effects to improve production quality. This article introduces a flexible class of zero inflated models which includes other familiar models such as the Zero Inflated Poisson (ZIP) models, as special cases. A Bayesian estimation method is developed as an alternative to traditionally used maximum likelihood based methods to analyze such data. Simulation studies show that the proposed method has better finite sample performance than the classical method with tighter interval estimates and better coverage probabilities. A real-life data set is analyzed to illustrate the practicability of the proposed method easily implemented using WinBUGS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号