首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bivariate count data arise in several different disciplines (epidemiology, marketing, sports statistics just to name a few) and the bivariate Poisson distribution being a generalization of the Poisson distribution plays an important role in modelling such data. In the present paper we present a Bayesian estimation approach for the parameters of the bivariate Poisson model and provide the posterior distributions in closed forms. It is shown that the joint posterior distributions are finite mixtures of conditionally independent gamma distributions for which their full form can be easily deduced by a recursively updating scheme. Thus, the need of applying computationally demanding MCMC schemes for Bayesian inference in such models will be removed, since direct sampling from the posterior will become available, even in cases where the posterior distribution of functions of the parameters is not available in closed form. In addition, we define a class of prior distributions that possess an interesting conjugacy property which extends the typical notion of conjugacy, in the sense that both prior and posteriors belong to the same family of finite mixture models but with different number of components. Extension to certain other models including multivariate models or models with other marginal distributions are discussed.  相似文献   

2.
Count data are very often analyzed under the assumption of a Poisson model [(Agresti, A., 1996. An Introduction to Categorical Data Analysis. Wiley, New York; Generalized Linear Models, second ed. Chapman & Hall, New York)]. However, the derived inference is generally erroneous if the underlying distribution is not Poisson (Biometrika 70, 269–274).A parametric robust regression approach is proposed for the analysis of count data. More specifically it will be demonstrated that the Poisson regression model could be properly adjusted to become asymptotically valid for inference about regression parameters, even if the Poisson assumption fails. With large samples the novel robust methodology provides legitimate likelihood functions for regression parameters, so long as the true underlying distributions have finite second moments. Adjustments that robustify the Poisson regression will be given, respectively, under log link and identity link functions. Simulation studies will be used to demonstrate the efficacy of the robust Poisson regression model.  相似文献   

3.
Count data have emerged in many applied research areas. In recent years, there has been a considerable interest in models for count data. In modelling such data, it is common to face a large frequency of zeroes. The data are regarded as zero-inflated when the frequency of observed zeroes is larger than what is expected from a theoretical distribution such as Poisson distribution, as a standard model for analysing count data. Data analysis, using the simple Poisson model, may lead to over-dispersion. Several classes of different mixture models were proposed for handling zero-inflated data. But they do not apply to cases when inflated counts happen at some other points, in addition to zero. In these cases, a doubly-inflated Poisson model has been suggested which only be used for cross-sectional data and cannot consider correlations between observations. However, correlated count data have a large application, especially in the health and medical fields. The present study aims to introduce a Doubly-Inflated Poisson models with random effect for correlated doubly-inflated data. Then, the best performance of the proposed method is shown via different simulation scenarios. Finally, the proposed model is applied to a dental study.KEYWORDS: Count data, doubly-inflated, Poisson regression, zero-inflated, correlated data  相似文献   

4.
In many financial applications, Poisson mixture regression models are commonly used to analyze heterogeneous count data. When fitting these models, the observed counts are supposed to come from two or more subpopulations and parameter estimation is typically performed by means of maximum likelihood via the Expectation–Maximization algorithm. In this study, we discuss briefly the procedure for fitting Poisson mixture regression models by means of maximum likelihood, the model selection and goodness-of-fit tests. These models are applied to a real data set for credit-scoring purposes. We aim to reveal the impact of demographic and financial variables in creating different groups of clients and to predict the group to which each client belongs, as well as his expected number of defaulted payments. The model's conclusions are very interesting, revealing that the population consists of three groups, contrasting with the traditional good versus bad categorization approach of the credit-scoring systems.  相似文献   

5.
Frailty models are often used to model heterogeneity in survival analysis. The distribution of the frailty is generally assumed to be continuous. In some circumstances, it is appropriate to consider discrete frailty distributions. Having zero frailty can be interpreted as being immune, and population heterogeneity may be analysed using discrete frailty models. In this paper, survival functions are derived for the frailty models based on the discrete compound Poisson process. Maximum likelihood estimation procedures for the parameters are studied. We examine the fit of the models to earthquake and the traffic accidents’ data sets from Turkey.  相似文献   

6.
In this paper we present Bayesian analysis of finite mixtures of multivariate Poisson distributions with an unknown number of components. The multivariate Poisson distribution can be regarded as the discrete counterpart of the multivariate normal distribution, which is suitable for modelling multivariate count data. Mixtures of multivariate Poisson distributions allow for overdispersion and for negative correlations between variables. To perform Bayesian analysis of these models we adopt a reversible jump Markov chain Monte Carlo (MCMC) algorithm with birth and death moves for updating the number of components. We present results obtained from applying our modelling approach to simulated and real data. Furthermore, we apply our approach to a problem in multivariate disease mapping, namely joint modelling of diseases with correlated counts.  相似文献   

7.
The generalized Poisson (GP) regression is an increasingly popular approach for modeling overdispersed as well as underdispersed count data. Several parameterizations have been performed for the GP regression, and the two well known models, the GP-1 and the GP-2, have been applied. The GP-P regression, which has been recently proposed, has the advantage of nesting the GP-1 and the GP-2 parametrically, besides allowing the statistical tests of the GP-1 and the GP-2 against a more general alternative. In several cases, count data often have excessive number of zero outcomes than are expected in the Poisson. This zero-inflation phenomenon is a specific cause of overdispersion, and the zero-inflated Poisson (ZIP) regression model has been proposed. However, if the data continue to suggest additional overdispersion, the zero-inflated negative binomial (ZINB-1 and ZINB-2) and the zero-inflated generalized Poisson (ZIGP-1 and ZIGP-2) regression models have been considered as alternatives. This article proposes a functional form of the ZIGP which mixes a distribution degenerate at zero with a GP-P distribution. The suggested model has the advantage of nesting the ZIP and the two well known ZIGP (ZIGP-1 and ZIGP-2) regression models, besides allowing the statistical tests of the ZIGP-1 and the ZIGP-2 against a more general alternative. The ZIP and the functional form of the ZIGP regression models are fitted, compared and tested on two sets of count data; the Malaysian insurance claim data and the German healthcare data.  相似文献   

8.
A robust estimator is developed for Poisson mixture models with a known number of components. The proposed estimator minimizes the L2 distance between a sample of data and the model. When the component distributions are completely known, the estimators for the mixing proportions are in closed form. When the parameters for the component Poisson distributions are unknown, numerical methods are needed to calculate the estimators. Compared to the minimum Hellinger distance estimator, the minimum L2 estimator can be less robust to extreme outliers, and often more robust to moderate outliers.  相似文献   

9.
In the present paper we examine finite mixtures of multivariate Poisson distributions as an alternative class of models for multivariate count data. The proposed models allow for both overdispersion in the marginal distributions and negative correlation, while they are computationally tractable using standard ideas from finite mixture modelling. An EM type algorithm for maximum likelihood (ML) estimation of the parameters is developed. The identifiability of this class of mixtures is proved. Properties of ML estimators are derived. A real data application concerning model based clustering for multivariate count data related to different types of crime is presented to illustrate the practical potential of the proposed class of models.  相似文献   

10.
A class of bivariate continuous-discrete distributions is proposed to fit Poisson dynamic models in a single unified framework via bivariate mixture transition distributions (BMTDs). Potential advantages of this class over the current models include its ability to capture stretches, bursts and nonlinear patterns characterized by Internet network traffic, high-frequency financial data and many others. It models the inter-arrival times and the number of arrivals (marks) in a single unified model which benefits from the dependence structure of the data. The continuous marginal distributions of this class include as special cases the exponential, gamma, Weibull and Rayleigh distributions (for the inter-arrival times), whereas the discrete marginal distributions are geometric and negative binomial. The conditional distributions are Poisson and Erlang. Maximum-likelihood estimation is discussed and parameter estimates are obtained using an expectation–maximization algorithm, while the standard errors are estimated using the missing information principle. It is shown via real data examples that the proposed BMTD models appear to capture data features better than other competing models.  相似文献   

11.
A multivariate generalized Poisson regression model based on the multivariate generalized Poisson distribution is defined and studied. The regression model can be used to describe a count data with any type of dispersion. The model allows for both positive and negative correlation between any pair of the response variables. The parameters of the regression model are estimated by using the maximum likelihood method. Some test statistics are discussed, and two numerical data sets are used to illustrate the applications of the multivariate count data regression model.  相似文献   

12.
We review Bayesian analysis of hierarchical non-standard Poisson regression models with an emphasis on microlevel heterogeneity and macrolevel autocorrelation. For the former case, we confirm that negative binomial regression usually accounts for microlevel heterogeneity (overdispersion) satisfactorily; for the latter case, we apply the simple first-order Markov transition model to conveniently capture the macrolevel autocorrelation which often arises from temporal and/or spatial count data, rather than attaching complex random effects directly to the regression parameters. Specifically, we extend the hierarchical (multilevel) Poisson model into negative binomial models with macrolevel autocorrelation using restricted gamma mixture with unit mean and Markov transition covariate created from preceding residuals. We prove a mild sufficient condition for posterior propriety under flat prior for the interesting fixed effects. Our methodology is implemented by analyzing the Baltic sea peracarids diurnal activity data published in the marine biology and ecology literature.  相似文献   

13.
Dependent multivariate count data occur in several research studies. These data can be modelled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula-based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.  相似文献   

14.
Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models.  相似文献   

15.
Zero-inflated count data are frequently encountered in public health and epidemiology research. Two-parts model is often used to model the excessive zeros, which are a mixture of two components: a point mass at zero and a count distribution, such as a Poisson distribution. When the rate of events per unit exposure is of interest, offset is commonly used to account for the varying extent of exposure, which is essentially a predictor whose regression coefficient is fixed at one. Such an assumption of exposure effect is, however, quite restrictive for many practical problems. Further, for zero-inflated models, offset is often only included in the count component of the model. However, the probability of excessive zero component could also be affected by the amount of ‘exposure’. We, therefore, proposed incorporating the varying exposure as a covariate rather than an offset term in both the probability of excessive zeros and conditional counts components of the zero-inflated model. A real example is used to illustrate the usage of the proposed methods, and simulation studies are conducted to assess the performance of the proposed methods for a broad variety of situations.  相似文献   

16.
Abstract. The zero‐inflated Poisson regression model is a special case of finite mixture models that is useful for count data containing many zeros. Typically, maximum likelihood (ML) estimation is used for fitting such models. However, it is well known that the ML estimator is highly sensitive to the presence of outliers and can become unstable when mixture components are poorly separated. In this paper, we propose an alternative robust estimation approach, robust expectation‐solution (RES) estimation. We compare the RES approach with an existing robust approach, minimum Hellinger distance (MHD) estimation. Simulation results indicate that both methods improve on ML when outliers are present and/or when the mixture components are poorly separated. However, the RES approach is more efficient in all the scenarios we considered. In addition, the RES method is shown to yield consistent and asymptotically normal estimators and, in contrast to MHD, can be applied quite generally.  相似文献   

17.
Inflated data and over-dispersion are two common problems when modeling count data with traditional Poisson regression models. In this study, we propose a latent class inflated Poisson (LCIP) regression model to solve the unobserved heterogeneity that leads to inflations and over-dispersion. The performance of the model estimation is evaluated through simulation studies. We illustrate the usefulness of introducing a latent class variable by analyzing the Behavioral Risk Factor Surveillance System (BRFSS) data, which contain several excessive values and characterized by over-dispersion. As a result, the new model we proposed displays a better fit than the standard Poisson regression and zero-inflated Poisson regression models for the inflated counts.KEYWORDS: Inflated data, latent class, heterogeneity, Poisson regression, over-dispersion  相似文献   

18.
ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

19.
Medical and public health research often involve the analysis of repeated or longitudinal count data that exhibit excess zeros such as the number of yearly doctor visits by a group of individuals over a number of years. Zero-inflated Poisson (ZIP) regression models can be used to account for excess zeros in count data. We propose an extension of the ZIP model that is appropriate for longitudinal data. Our extension includes a non stationary, observation-driven time series model based correlation structure. We discuss estimation of the model parameters and the inefficiency of the estimators when the correlation structure is mis-specified. The model's application to the analysis of health care utilization data is also discussed.  相似文献   

20.
Frailty models can be fit as mixed-effects Poisson models after transforming time-to-event data to the Poisson model framework. We assess, through simulations, the robustness of Poisson likelihood estimation for Cox proportional hazards models with log-normal frailties under misspecified frailty distribution. The log-gamma and Laplace distributions were used as true distributions for frailties on a natural log scale. Factors such as the magnitude of heterogeneity, censoring rate, number and sizes of groups were explored. In the simulations, the Poisson modeling approach that assumes log-normally distributed frailties provided accurate estimates of within- and between-group fixed effects even under a misspecified frailty distribution. Non-robust estimation of variance components was observed in the situations of substantial heterogeneity, large event rates, or high data dimensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号