首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inflated data and over-dispersion are two common problems when modeling count data with traditional Poisson regression models. In this study, we propose a latent class inflated Poisson (LCIP) regression model to solve the unobserved heterogeneity that leads to inflations and over-dispersion. The performance of the model estimation is evaluated through simulation studies. We illustrate the usefulness of introducing a latent class variable by analyzing the Behavioral Risk Factor Surveillance System (BRFSS) data, which contain several excessive values and characterized by over-dispersion. As a result, the new model we proposed displays a better fit than the standard Poisson regression and zero-inflated Poisson regression models for the inflated counts.KEYWORDS: Inflated data, latent class, heterogeneity, Poisson regression, over-dispersion  相似文献   

2.
The bivariate negative binomial regression (BNBR) and the bivariate Poisson log-normal regression (BPLR) models have been used to describe count data that are over-dispersed. In this paper, a new bivariate generalized Poisson regression (BGPR) model is defined. An advantage of the new regression model over the BNBR and BPLR models is that the BGPR can be used to model bivariate count data with either over-dispersion or under-dispersion. In this paper, we carry out a simulation study to compare the three regression models when the true data-generating process exhibits over-dispersion. In the simulation experiment, we observe that the bivariate generalized Poisson regression model performs better than the bivariate negative binomial regression model and the BPLR model.  相似文献   

3.
In this paper we consider spatial regression models for count data. We examine not only the Poisson distribution but also the generalized Poisson capable of modeling over-dispersion, the negative Binomial as well as the zero-inflated Poisson distribution which allows for excess zeros as possible response distribution. We add random spatial effects for modeling spatial dependency and develop and implement MCMC algorithms in $R$ for Bayesian estimation. The corresponding R library ‘spatcounts’ is available on CRAN. In an application the presented models are used to analyze the number of benefits received per patient in a German private health insurance company. Since the deviance information criterion is only appropriate for exponential family models, we use in addition the Vuong and Clarke test with a Schwarz correction to compare possibly non nested models. We illustrate how they can be used in a Bayesian context.  相似文献   

4.
Distance sampling and capture–recapture are the two most widely used wildlife abundance estimation methods. capture–recapture methods have only recently incorporated models for spatial distribution and there is an increasing tendency for distance sampling methods to incorporated spatial models rather than to rely on partly design-based spatial inference. In this overview we show how spatial models are central to modern distance sampling and that spatial capture–recapture models arise as an extension of distance sampling methods. Depending on the type of data recorded, they can be viewed as particular kinds of hierarchical binary regression, Poisson regression, survival or time-to-event models, with individuals’ locations as latent variables and a spatial model as the latent variable distribution. Incorporation of spatial models in these two methods provides new opportunities for drawing explicitly spatial inferences. Areas of likely future development include more sophisticated spatial and spatio-temporal modelling of individuals’ locations and movements, new methods for integrating spatial capture–recapture and other kinds of ecological survey data, and methods for dealing with the recapture uncertainty that often arise when “capture” consists of detection by a remote device like a camera trap or microphone.  相似文献   

5.
In this article, a new mixed Poisson distribution is introduced. This new distribution is obtained by utilizing mixing process, with Poisson distribution as mixed distribution and Transmuted Exponential as mixing distribution. Distributional properties like unimodality, moments, over-dispersion, infinite divisibility are studied. Three methods viz. Method of moment, Method of moment and proportion, and Maximum-likelihood method are used for parameter estimation. Further, an actuarial application in context of aggregate claim distribution is presented. Finally, to show the applicability and superiority of proposed model, we discuss count data and count regression modeling and compare with some well established models.  相似文献   

6.
While excess zeros are often thought to cause data over-dispersion (i.e. when the variance exceeds the mean), this implication is not absolute. One should instead consider a flexible class of distributions that can address data dispersion along with excess zeros. This work develops a zero-inflated sum-of-Conway-Maxwell-Poissons (ZISCMP) regression as a flexible analysis tool to model count data that express significant data dispersion and contain excess zeros. This class of models contains several special case zero-inflated regressions, including zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-inflated binomial (ZIB), and the zero-inflated Conway-Maxwell-Poisson (ZICMP). Through simulated and real data examples, we demonstrate class flexibility and usefulness. We further utilize it to analyze shark species data from Australia's Great Barrier Reef to assess the environmental impact of human action on the number of various species of sharks.  相似文献   

7.
Poisson regression is the most well-known method for modeling count data. When data display over-dispersion, thereby violating the underlying equi-dispersion assumption of Poisson regression, the common solution is to use negative-binomial regression. We show, however, that count data that appear to be equi- or over-dispersed may actually stem from a mixture of populations with different dispersion levels. To detect and model such a mixture, we introduce a generalization of the Conway-Maxwell-Poisson (COM-Poisson) regression model that allows for group-level dispersion. We illustrate mixed dispersion effects and the proposed methodology via semi-authentic data.  相似文献   

8.
Count data have emerged in many applied research areas. In recent years, there has been a considerable interest in models for count data. In modelling such data, it is common to face a large frequency of zeroes. The data are regarded as zero-inflated when the frequency of observed zeroes is larger than what is expected from a theoretical distribution such as Poisson distribution, as a standard model for analysing count data. Data analysis, using the simple Poisson model, may lead to over-dispersion. Several classes of different mixture models were proposed for handling zero-inflated data. But they do not apply to cases when inflated counts happen at some other points, in addition to zero. In these cases, a doubly-inflated Poisson model has been suggested which only be used for cross-sectional data and cannot consider correlations between observations. However, correlated count data have a large application, especially in the health and medical fields. The present study aims to introduce a Doubly-Inflated Poisson models with random effect for correlated doubly-inflated data. Then, the best performance of the proposed method is shown via different simulation scenarios. Finally, the proposed model is applied to a dental study.KEYWORDS: Count data, doubly-inflated, Poisson regression, zero-inflated, correlated data  相似文献   

9.
Abstract

We develop and exemplify application of new classes of dynamic models for time series of nonnegative counts. Our novel univariate models combine dynamic generalized linear models for binary and conditionally Poisson time series, with dynamic random effects for over-dispersion. These models estimate dynamic regression coefficients in both binary and nonzero count components. Sequential Bayesian analysis allows fast, parallel analysis of sets of decoupled time series. New multivariate models then enable information sharing in contexts when data at a more highly aggregated level provide more incisive inferences on shared patterns such as trends and seasonality. A novel multiscale approach—one new example of the concept of decouple/recouple in time series—enables information sharing across series. This incorporates cross-series linkages while insulating parallel estimation of univariate models, and hence enables scalability in the number of series. The major motivating context is supermarket sales forecasting. Detailed examples drawn from a case study in multistep forecasting of sales of a number of related items showcase forecasting of multiple series, with discussion of forecast accuracy metrics, comparisons with existing methods, and broader questions of probabilistic forecast assessment.  相似文献   

10.
Zero-inflated count models are increasingly employed in many fields in case of “zero-inflation”. In modeling road traffic crashes, it has also shown to be useful in obtaining a better model-fitting when zero crash counts are over-presented. However, the general specification of zero-inflated model can not account for the multilevel data structure in crash data, which may be an important source of over-dispersion. This paper examines zero-inflated Poisson regression with site-specific random effects (REZIP) with comparison to random effect Poisson model and standard zero-inflated poison model. A practical and flexible procedure, using Bayesian inference with Markov Chain Monte Carlo algorithm and cross-validation predictive density techniques, is applied for model calibration and suitability assessment. Using crash data in Singapore (1998–2005), the illustrative results demonstrate that the REZIP model may significantly improve the model-fitting and predictive performance of crash prediction models. This improvement can contribute to traffic safety management and engineering practices such as countermeasure design and safety evaluation of traffic treatments.  相似文献   

11.
The purpose of this paper is to develop a new linear regression model for count data, namely generalized-Poisson Lindley (GPL) linear model. The GPL linear model is performed by applying generalized linear model to GPL distribution. The model parameters are estimated by the maximum likelihood estimation. We utilize the GPL linear model to fit two real data sets and compare it with the Poisson, negative binomial (NB) and Poisson-weighted exponential (P-WE) models for count data. It is found that the GPL linear model can fit over-dispersed count data, and it shows the highest log-likelihood, the smallest AIC and BIC values. As a consequence, the linear regression model from the GPL distribution is a valuable alternative model to the Poisson, NB, and P-WE models.  相似文献   

12.
We investigate robust M-estimators of location and over-dispersion for independent and identically distributed samples from Poisson and Negative Binomial (NB)distributions. We focus on asymptotic and small-sample efficiencies, outlier-induced biases, and biases caused by model mis-specification. This is important information for assessing the practical utility of the estimation method. Our results demonstrate that resonably efficient estimation of location and over-dispersion parameters for count data is possible with samples sizes as small as n=25. The sensitivity of these stimators, especially when the amount of over-dispersion is small. We aslo conclude that serious biases result when using robust Poisson M-estimation with NB data. The biases are less serious when using robust NB M-estimation with Poisson data.  相似文献   

13.
The generalized Poisson (GP) regression model has been used to model count data that exhibit over-dispersion or under-dispersion. The zero-inflated GP (ZIGP) regression model can additionally handle count data characterized by many zeros. However, the parameters of ZIGP model cannot easily be used for inference on overall exposure effects. In order to address this problem, a marginalized ZIGP is proposed to directly model the population marginal mean count. The parameters of the marginalized zero-inflated GP model are estimated by the method of maximum likelihood. The regression model is illustrated by three real-life data sets.  相似文献   

14.
In this article, we propose two novel diagnostic measures for the deletion of influential observations for regression parameters in the setting of generalized linear models. The proposed diagnostic methods are capable for detecting the influential observations under model misspecification, as long as the true underlying distributions have finite second moments.More specifically, it is demonstrated that the Poisson likelihood function can be properly adjusted to become asymptotically valid for practically all underlying discrete distributions. The adjusted Poisson regression model that achieves the robustness property is presented. Simulation studies and an illustration are performed to demonstrate the efficacy of the two novel diagnostic procedures.  相似文献   

15.
Modelling count data with overdispersion and spatial effects   总被引:1,自引:1,他引:0  
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson (GP) distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding correlated spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. in Stat Comput 12(4):353–367, (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. in J R Stat Soc B64(4):583–640, (2002) and using proper scoring rules, see for example Gneiting and Raftery in Technical Report no. 463, University of Washington, (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, spatial Poisson models with spatially correlated or uncorrelated random effects are to be preferred over all other models according to the considered criteria.  相似文献   

16.
This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner's (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood is known explicitly, except for the computation of an integral over the sampling region. Approximation of this integral by a finite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximized using standard statistical software for generalized linear or additive models, provided the conditional intensity of the process takes an 'exponential family' form. Using this approach a wide variety of spatial point process models of Gibbs type can be fitted rapidly, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information.  相似文献   

17.
Count data often display excessive number of zero outcomes than are expected in the Poisson regression model. The zero-inflated Poisson regression model has been suggested to handle zero-inflated data, whereas the zero-inflated negative binomial (ZINB) regression model has been fitted for zero-inflated data with additional overdispersion. For bivariate and zero-inflated cases, several regression models such as the bivariate zero-inflated Poisson (BZIP) and bivariate zero-inflated negative binomial (BZINB) have been considered. This paper introduces several forms of nested BZINB regression model which can be fitted to bivariate and zero-inflated count data. The mean–variance approach is used for comparing the BZIP and our forms of BZINB regression model in this study. A similar approach was also used by past researchers for defining several negative binomial and zero-inflated negative binomial regression models based on the appearance of linear and quadratic terms of the variance function. The nested BZINB regression models proposed in this study have several advantages; the likelihood ratio tests can be performed for choosing the best model, the models have flexible forms of marginal mean–variance relationship, the models can be fitted to bivariate zero-inflated count data with positive or negative correlations, and the models allow additional overdispersion of the two dependent variables.  相似文献   

18.
神经网络模型与车险索赔频率预测   总被引:1,自引:0,他引:1       下载免费PDF全文
孟生旺 《统计研究》2012,29(3):22-26
汽车保险广受社会关注,且在财产保险公司具有举足轻重的地位,因此汽车保险的索赔频率预测模型一直是非寿险精算理论和应用研究的重点之一。目前最为流行的索赔频率预测模型是广义线性模型,其中包括泊松回归、负二项回归和泊松-逆高斯回归等。本文基于一组实际的车险损失数据,对索赔频率的各种广义线性模型与神经网络模型和回归树模型进行了比较,得出了一些新的结论,即神经网络模型的拟合效果优于广义线性模型,在广义线性模型中,泊松回归的拟合效果优于负二项回归和泊松-逆高斯回归。线性回归模型的拟合效果最差,回归树模型的拟合效果略好于线性回归模型。  相似文献   

19.
This paper is concerned with selection of explanatory variables in generalized linear models (GLM). The class of GLM's is quite large and contains e.g. the ordinary linear regression, the binary logistic regression, the probit model and Poisson regression with linear or log-linear parameter structure. We show that, through an approximation of the log likelihood and a certain data transformation, the variable selection problem in a GLM can be converted into variable selection in an ordinary (unweighted) linear regression model. As a consequence no specific computer software for variable selection in GLM's is needed. Instead, some suitable variable selection program for linear regression can be used. We also present a simulation study which shows that the log likelihood approximation is very good in many practical situations. Finally, we mention briefly possible extensions to regression models outside the class of GLM's.  相似文献   

20.
Clustered (longitudinal) count data arise in many bio-statistical practices in which a number of repeated count responses are observed on a number of individuals. The repeated observations may also represent counts over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. For over-dispersed count data with unknown over-dispersion parameter we derive two score tests by assuming a random intercept model within the framework of (i) the negative binomial mixed effects model and (ii) the double extended quasi-likelihood mixed effects model (Lee and Nelder, 2001). These two statistics are much simpler than a statistic derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear model. The first statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistic is expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. A data set is analyzed and a discussion is given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号