首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 35 毫秒
1.
This paper proposes a simple and flexible count data regression model which is able to incorporate overdispersion (the variance is greater than the mean) and which can be considered a competitor to the Poisson model. As is well known, this classical model imposes the restriction that the conditional mean of each count variable must equal the conditional variance. Nevertheless, for the common case of well-dispersed counts the Poisson regression may not be appropriate, while the count regression model proposed here is potentially useful. We consider an application to model counts of medical care utilization by the elderly in the USA using a well-known data set from the National Medical Expenditure Survey (1987), where the dependent variable is the number of stays after hospital admission, and where 10 explanatory variables are analysed.  相似文献   

2.
Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of sample size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the sample size, n, and are derived in an asymptotic framework where n tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, m, plays the role that sample size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case.  相似文献   

3.
Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established.  相似文献   

4.
In the regression analysis of time series of event counts, it is of interest to account for serial dependence that is likely to be present among such data as well as a nonlinear interaction between the expected event counts and predictors as a function of some underlying variables. We thus develop a Poisson autoregressive varying-coefficient model, which introduces autocorrelation through a latent process and allows regression coefficients to nonparametrically vary as a function of the underlying variables. The nonparametric functions for varying regression coefficients are estimated with data-driven basis selection, thereby avoiding overfitting and adapting to curvature variation. An efficient posterior sampling scheme is devised to analyse the proposed model. The proposed methodology is illustrated using simulated data and daily homicide data in Cali, Colombia.  相似文献   

5.
Count data often display excessive number of zero outcomes than are expected in the Poisson regression model. The zero-inflated Poisson regression model has been suggested to handle zero-inflated data, whereas the zero-inflated negative binomial (ZINB) regression model has been fitted for zero-inflated data with additional overdispersion. For bivariate and zero-inflated cases, several regression models such as the bivariate zero-inflated Poisson (BZIP) and bivariate zero-inflated negative binomial (BZINB) have been considered. This paper introduces several forms of nested BZINB regression model which can be fitted to bivariate and zero-inflated count data. The mean–variance approach is used for comparing the BZIP and our forms of BZINB regression model in this study. A similar approach was also used by past researchers for defining several negative binomial and zero-inflated negative binomial regression models based on the appearance of linear and quadratic terms of the variance function. The nested BZINB regression models proposed in this study have several advantages; the likelihood ratio tests can be performed for choosing the best model, the models have flexible forms of marginal mean–variance relationship, the models can be fitted to bivariate zero-inflated count data with positive or negative correlations, and the models allow additional overdispersion of the two dependent variables.  相似文献   

6.
Count data have emerged in many applied research areas. In recent years, there has been a considerable interest in models for count data. In modelling such data, it is common to face a large frequency of zeroes. The data are regarded as zero-inflated when the frequency of observed zeroes is larger than what is expected from a theoretical distribution such as Poisson distribution, as a standard model for analysing count data. Data analysis, using the simple Poisson model, may lead to over-dispersion. Several classes of different mixture models were proposed for handling zero-inflated data. But they do not apply to cases when inflated counts happen at some other points, in addition to zero. In these cases, a doubly-inflated Poisson model has been suggested which only be used for cross-sectional data and cannot consider correlations between observations. However, correlated count data have a large application, especially in the health and medical fields. The present study aims to introduce a Doubly-Inflated Poisson models with random effect for correlated doubly-inflated data. Then, the best performance of the proposed method is shown via different simulation scenarios. Finally, the proposed model is applied to a dental study.KEYWORDS: Count data, doubly-inflated, Poisson regression, zero-inflated, correlated data  相似文献   

7.
In recent years, there has been considerable interest in regression models based on zero-inflated distributions. These models are commonly encountered in many disciplines, such as medicine, public health, and environmental sciences, among others. The zero-inflated Poisson (ZIP) model has been typically considered for these types of problems. However, the ZIP model can fail if the non-zero counts are overdispersed in relation to the Poisson distribution, hence the zero-inflated negative binomial (ZINB) model may be more appropriate. In this paper, we present a Bayesian approach for fitting the ZINB regression model. This model considers that an observed zero may come from a point mass distribution at zero or from the negative binomial model. The likelihood function is utilized to compute not only some Bayesian model selection measures, but also to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. The approach can be easily implemented using standard Bayesian software, such as WinBUGS. The performance of the proposed method is evaluated with a simulation study. Further, a real data set is analyzed, where we show that ZINB regression models seems to fit the data better than the Poisson counterpart.  相似文献   

8.
Using a direct resampling process, a Bayesian approach is developed for the analysis of the shiftpoint problem. In many problems it is straight forward to isolate the marginal posterior distribution of the shift-point parameter and the conditional distribution of some of the parameters given the shift point and the other remaining parameters. When this is possible, a direct sampling approach is easily implemented whereby standard random number generators can be used to generate samples from the joint posterior distribution of aii the parameters in the model. This technique is illustrated with examples involving one shift for Poisson processes and regression models.  相似文献   

9.
Zero-inflated count models are increasingly employed in many fields in case of “zero-inflation”. In modeling road traffic crashes, it has also shown to be useful in obtaining a better model-fitting when zero crash counts are over-presented. However, the general specification of zero-inflated model can not account for the multilevel data structure in crash data, which may be an important source of over-dispersion. This paper examines zero-inflated Poisson regression with site-specific random effects (REZIP) with comparison to random effect Poisson model and standard zero-inflated poison model. A practical and flexible procedure, using Bayesian inference with Markov Chain Monte Carlo algorithm and cross-validation predictive density techniques, is applied for model calibration and suitability assessment. Using crash data in Singapore (1998–2005), the illustrative results demonstrate that the REZIP model may significantly improve the model-fitting and predictive performance of crash prediction models. This improvement can contribute to traffic safety management and engineering practices such as countermeasure design and safety evaluation of traffic treatments.  相似文献   

10.
This paper proposes and investigates a class of Markov Poisson regression models in which Poisson rate functions of covariates are conditional on unobserved states which follow a finite-state Markov chain. Features of the proposed model, estimation, inference, bootstrap confidence intervals, model selection and other implementation issues are discussed. Monte Carlo studies suggest that the proposed estimation method is accurate and reliable for single- and multiple-subject time series data; the choice of starting probabilities for the Markov process has little eff ect on the parameter estimates; and penalized likelihood criteria are reliable for determining the number of states. Part 2 provides applications of the proposed model.  相似文献   

11.
Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models.  相似文献   

12.
This paper presents results from a simulation study motivated by a recent study of the relationships between ambient levels of air pollution and human health in the community of Prince George, British Columbia. The simulation study was designed to evaluate the performance of methods based on overdispersed Poisson regression models for the analysis of series of count data. Aspects addressed include estimation of the dispersion parameter, estimation of regression coefficients and their standard errors, and the performance of model selection tests. The effects of varying amounts of overdispersion and differing underlying variance structure on this performance were of particular interest. This study is related to work reported by Breslow (1990) although the context is quite different. Preliminary work led to the conclusion that estimation of the dispersion parameter should be based on Pearson's chi-square statistic rather than the Poisson deviance. Regression coefficients are well estimated, even in the présence of substantial overdispersion and when the model for the variance function is incorrectly specified. Despite potential greater variability, the empirical estimator of the covariance matrix is preferred because the model-based estimator is unreliable in general. When the model for the variance function is incorrect, model-based test statistics may perform poorly, in sharp contrast to empirical test statistics, which performed very well in this study.  相似文献   

13.
The problem of discriminating between the Poisson and binomial models is discussed in the context of a detailed statistical analysis of the number of appointments of the U.S. Supreme Court justices from 1789 to 2004. Various new and existing tests are examined. The analysis shows that both simple Poisson and simple binomial models are equally appropriate for describing the data. No firm statistical evidence in favour of an exponential Poisson regression model was found. Two attendant results were obtained by simulation: firstly, that the likelihood ratio test is the most powerful of those considered when testing for the Poisson versus binomial and, secondly, that the classical variance test with an upper-tail critical region is biased.  相似文献   

14.
Focusing on the model selection problems in the family of Poisson mixture models (including the Poisson mixture regression model with random effects and zero‐inflated Poisson regression model with random effects), the current paper derives two conditional Akaike information criteria. The criteria are the unbiased estimators of the conditional Akaike information based on the conditional log‐likelihood and the conditional Akaike information based on the joint log‐likelihood, respectively. The derivation is free from the specific parametric assumptions about the conditional mean of the true data‐generating model and applies to different types of estimation methods. Additionally, the derivation is not based on the asymptotic argument. Simulations show that the proposed criteria have promising estimation accuracy. In addition, it is found that the criterion based on the conditional log‐likelihood demonstrates good model selection performance under different scenarios. Two sets of real data are used to illustrate the proposed method.  相似文献   

15.
In modeling count data collected from manufacturing processes, economic series, disease outbreaks and ecological surveys, there are usually a relatively large or small number of zeros compared to positive counts. Such low or high frequencies of zero counts often require the use of underdispersed or overdispersed probability models for the underlying data generating mechanism. The commonly used models such as generalized or zero-inflated Poisson distributions are parametric and can usually account for only the overdispersion, but such distributions are often found to be inadequate in modeling underdispersion because of the need for awkward parameter or support restrictions. This article introduces a flexible class of semiparametric zero-altered models which account for both underdispersion and overdispersion and includes other familiar models such as those mentioned above as special cases. Consistency and asymptotic normality of the estimator of the dispersion parameter are derived under general conditions. Numerical support for the performance of the proposed method of inference is presented for the case of common discrete distributions.  相似文献   

16.
This paper provides a practical simulation-based Bayesian analysis of parameter-driven models for time series Poisson data with the AR(1) latent process. The posterior distribution is simulated by a Gibbs sampling algorithm. Full conditional posterior distributions of unknown variables in the model are given in convenient forms for the Gibbs sampling algorithm. The case with missing observations is also discussed. The methods are applied to real polio data from 1970 to 1983.  相似文献   

17.
Matrix-analytic Models and their Analysis   总被引:2,自引:0,他引:2  
We survey phase-type distributions and Markovian point processes, aspects of how to use such models in applied probability calculations and how to fit them to observed data. A phase-type distribution is defined as the time to absorption in a finite continuous time Markov process with one absorbing state. This class of distributions is dense and contains many standard examples like all combinations of exponential in series/parallel. A Markovian point process is governed by a finite continuous time Markov process (typically ergodic), such that points are generated at a Poisson intensity depending on the underlying state and at transitions; a main special case is a Markov-modulated Poisson process. In both cases, the analytic formulas typically contain matrix-exponentials, and the matrix formalism carried over when the models are used in applied probability calculations as in problems in renewal theory, random walks and queueing. The statistical analysis is typically based upon the EM algorithm, viewing the whole sample path of the background Markov process as the latent variable.  相似文献   

18.
The mixed Poisson–inverse-Gaussian distribution has been used by Holla, Sankaran, Sichel, and others in univariate problems involving counts. We propose a Poisson–inverse-Gaussian regression model which can be used for regression analysis of counts. The model provides an attractive framework for incorporating random effects in Poisson regression models and in handling extra-Poisson variation. Maximum-likelihood and quasilikelihood-moment estimation is investigated and illustrated with an example involving motor-insurance claims.  相似文献   

19.
This article is about the statistical analysis of overdispersed paired count data for comparing two treatments. The data consist of the number of events obtained in a stratum during the fixed observation period. Three types of model are discussed: the Poisson, a mixed, and a semiparametric model. Overdispersion is represented in the last two models but not in the Poisson model. Of particular interests are to examine whether there is any loss of efficiency in using the estimate of the treatment effect obtained under other two models if the mixed model is true, and also whether overdispersion leads to a larger variance of the estimate than that expected from the Poisson model. It is shown that all three models provide the same estimate of the treatment effect (i.e., there is no loss of efficiency) and that the variance of the estimate of the treatment effect obtained under the Poisson model is the same as that based on the mixed model. However, the semiparametric model provides the variance of the estimate larger than those obtained under the other two models.  相似文献   

20.
Modelling count data with overdispersion and spatial effects   总被引:1,自引:1,他引:0  
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson (GP) distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding correlated spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. in Stat Comput 12(4):353–367, (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. in J R Stat Soc B64(4):583–640, (2002) and using proper scoring rules, see for example Gneiting and Raftery in Technical Report no. 463, University of Washington, (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, spatial Poisson models with spatially correlated or uncorrelated random effects are to be preferred over all other models according to the considered criteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号