首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

2.
Dependent multivariate count data occur in several research studies. These data can be modelled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula-based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.  相似文献   

3.
The analysis of word frequency count data can be very useful in authorship attribution problems. Zero-truncated generalized inverse Gaussian–Poisson mixture models are very helpful in the analysis of these kinds of data because their model-mixing density estimates can be used as estimates of the density of the word frequencies of the vocabulary. It is found that this model provides excellent fits for the word frequency counts of very long texts, where the truncated inverse Gaussian–Poisson special case fails because it does not allow for the large degree of over-dispersion in the data. The role played by the three parameters of this truncated GIG-Poisson model is also explored. Our second goal is to compare the fit of the truncated GIG-Poisson mixture model with the fit of the model that results from switching the order of the mixing and truncation stages. A heuristic interpretation of the mixing distribution estimates obtained under this alternative GIG-truncated Poisson mixture model is also provided.  相似文献   

4.
Negative binomial (NB) regression is the most common full‐likelihood method for analysing count data exhibiting overdispersion with respect to the Poisson distribution. Usually most practitioners are content to fit one of two NB variants, however other important variants exist. It is demonstrated here that the VGAM R package can fit them all under a common statistical framework founded upon a generalised linear and additive model approach. Additionally, other modifications such as zero‐altered (hurdle), zero‐truncated and zero‐inflated NB distributions are naturally handled. Rootograms are also available for graphically checking the goodness of fit. Two data sets and some recently added features of the VGAM package are used here for illustration.  相似文献   

5.
We consider partial likelihood analysis of a truncated Poisson regression model for time series of counts. We focus our attention on the study of asymptotic theory for the maximum partial likelihood estimator of a vector of regression parameters. Simulations and data analysis integrate the presentation.  相似文献   

6.
Hall (2000) has described zero‐inflated Poisson and binomial regression models that include random effects to account for excess zeros and additional sources of heterogeneity in the data. The authors of the present paper propose a general score test for the null hypothesis that variance components associated with these random effects are zero. For a zero‐inflated Poisson model with random intercept, the new test reduces to an alternative to the overdispersion test of Ridout, Demério & Hinde (2001). The authors also examine their general test in the special case of the zero‐inflated binomial model with random intercept and propose an overdispersion test in that context which is based on a beta‐binomial alternative.  相似文献   

7.
A generalized self-consistency approach to maximum likelihood estimation (MLE) and model building was developed in Tsodikov [2003. Semiparametric models: a generalized self-consistency approach. J. Roy. Statist. Soc. Ser. B Statist. Methodology 65(3), 759–774] and applied to a survival analysis problem. We extend the framework to obtain second-order results such as information matrix and properties of the variance. Multinomial model motivates the paper and is used throughout as an example. Computational challenges with the multinomial likelihood motivated Baker [1994. The Multinomial–Poisson transformation. The Statist. 43, 495–504] to develop the Multinomial–Poisson (MP) transformation for a large variety of regression models with multinomial likelihood kernel. Multinomial regression is transformed into a Poisson regression at the cost of augmenting model parameters and restricting the problem to discrete covariates. Imposing normalization restrictions by means of Lagrange multipliers [Lang, J., 1996. On the comparison of multinomial and Poisson log-linear models. J. Roy. Statist. Soc. Ser. B Statist. Methodology 58, 253–266] justifies the approach. Using the self-consistency framework we develop an alternative solution to multinomial model fitting that does not require augmenting parameters while allowing for a Poisson likelihood and arbitrary covariate structures. Normalization restrictions are imposed by averaging over artificial “missing data” (fake mixture). Lack of probabilistic interpretation at the “complete-data” level makes the use of the generalized self-consistency machinery essential.  相似文献   

8.
The paper provides a novel application of the probabilistic reduction (PR) approach to the analysis of multi-categorical outcomes. The PR approach, which systematically takes account of heterogeneity and functional form concerns, can improve the specification of binary regression models. However, its utility for systematically enriching the specification of and inference from models of multi-categorical outcomes has not been examined, while multinomial logistic regression models are commonly used for inference and, increasingly, prediction. Following a theoretical derivation of the PR-based multinomial logistic model (MLM), we compare functional specification and marginal effects from a traditional specification and a PR-based specification in a model of post-stroke hospital discharge disposition and find that the traditional MLM is misspecified. Results suggest that the impact on the reliability of substantive inferences from a misspecified model may be significant, even when model fit statistics do not suggest a strong lack of fit compared with a properly specified model using the PR approach. We identify situations under which a PR-based MLM specification can be advantageous to the applied researcher.  相似文献   

9.
The mixed Poisson–inverse-Gaussian distribution has been used by Holla, Sankaran, Sichel, and others in univariate problems involving counts. We propose a Poisson–inverse-Gaussian regression model which can be used for regression analysis of counts. The model provides an attractive framework for incorporating random effects in Poisson regression models and in handling extra-Poisson variation. Maximum-likelihood and quasilikelihood-moment estimation is investigated and illustrated with an example involving motor-insurance claims.  相似文献   

10.
The purpose of this paper is to develop a new linear regression model for count data, namely generalized-Poisson Lindley (GPL) linear model. The GPL linear model is performed by applying generalized linear model to GPL distribution. The model parameters are estimated by the maximum likelihood estimation. We utilize the GPL linear model to fit two real data sets and compare it with the Poisson, negative binomial (NB) and Poisson-weighted exponential (P-WE) models for count data. It is found that the GPL linear model can fit over-dispersed count data, and it shows the highest log-likelihood, the smallest AIC and BIC values. As a consequence, the linear regression model from the GPL distribution is a valuable alternative model to the Poisson, NB, and P-WE models.  相似文献   

11.
Count data have emerged in many applied research areas. In recent years, there has been a considerable interest in models for count data. In modelling such data, it is common to face a large frequency of zeroes. The data are regarded as zero-inflated when the frequency of observed zeroes is larger than what is expected from a theoretical distribution such as Poisson distribution, as a standard model for analysing count data. Data analysis, using the simple Poisson model, may lead to over-dispersion. Several classes of different mixture models were proposed for handling zero-inflated data. But they do not apply to cases when inflated counts happen at some other points, in addition to zero. In these cases, a doubly-inflated Poisson model has been suggested which only be used for cross-sectional data and cannot consider correlations between observations. However, correlated count data have a large application, especially in the health and medical fields. The present study aims to introduce a Doubly-Inflated Poisson models with random effect for correlated doubly-inflated data. Then, the best performance of the proposed method is shown via different simulation scenarios. Finally, the proposed model is applied to a dental study.KEYWORDS: Count data, doubly-inflated, Poisson regression, zero-inflated, correlated data  相似文献   

12.
In this paper, we investigate Bayesian generalized nonlinear mixed‐effects (NLME) regression models for zero‐inflated longitudinal count data. The methodology is motivated by and applied to colony forming unit (CFU) counts in extended bactericidal activity tuberculosis (TB) trials. Furthermore, for model comparisons, we present a generalized method for calculating the marginal likelihoods required to determine Bayes factors. A simulation study shows that the proposed zero‐inflated negative binomial regression model has good accuracy, precision, and credibility interval coverage. In contrast, conventional normal NLME regression models applied to log‐transformed count data, which handle zero counts as left censored values, may yield credibility intervals that undercover the true bactericidal activity of anti‐TB drugs. We therefore recommend that zero‐inflated NLME regression models should be fitted to CFU count on the original scale, as an alternative to conventional normal NLME regression models on the logarithmic scale.  相似文献   

13.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

14.
In recent years, there has been considerable interest in regression models based on zero-inflated distributions. These models are commonly encountered in many disciplines, such as medicine, public health, and environmental sciences, among others. The zero-inflated Poisson (ZIP) model has been typically considered for these types of problems. However, the ZIP model can fail if the non-zero counts are overdispersed in relation to the Poisson distribution, hence the zero-inflated negative binomial (ZINB) model may be more appropriate. In this paper, we present a Bayesian approach for fitting the ZINB regression model. This model considers that an observed zero may come from a point mass distribution at zero or from the negative binomial model. The likelihood function is utilized to compute not only some Bayesian model selection measures, but also to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. The approach can be easily implemented using standard Bayesian software, such as WinBUGS. The performance of the proposed method is evaluated with a simulation study. Further, a real data set is analyzed, where we show that ZINB regression models seems to fit the data better than the Poisson counterpart.  相似文献   

15.
In this article, we propose a parametric model for the distribution of time to first event when events are overdispersed and can be properly fitted by a Negative Binomial distribution. This is a very common situation in medical statistics, when the occurrence of events is summarized as a count for each patient and the simple Poisson model is not adequate to account for overdispersion of data. In this situation, studying the time of occurrence of the first event can be of interest. From the Negative Binomial distribution of counts, we derive a new parametric model for time to first event and apply it to fit the distribution of time to first relapse in multiple sclerosis (MS). We develop the regression model with methods for covariate estimation. We show that, as the Negative Binomial model properly fits relapse counts data, this new model matches quite perfectly the distribution of time to first relapse, as tested in two large datasets of MS patients. Finally we compare its performance, when fitting time to first relapse in MS, with other models widely used in survival analysis (the semiparametric Cox model and the parametric exponential, Weibull, log-logistic and log-normal models).  相似文献   

16.
Count response data often exhibit departures from the assumptions of standard Poisson generalized linear models. In particular, cluster level correlation of the data and truncation at zero are two common characteristics of such data. This paper describes a random components truncated Poisson model that can be applied to clustered and zero‐truncated count data. Residual maximum likelihood method estimators for the parameters of this model are developed and their use is illustrated using a dataset of non‐zero counts of sheets with edge‐strain defects in iron sheets produced by the Mobarekeh Steel Complex, Iran. The paper also reports on a small‐scale simulation study that supports the estimation procedure.  相似文献   

17.
We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion.  相似文献   

18.
Count data with excess zeros are widely encountered in the fields of biomedical, medical, public health and social survey, etc. Zero-inflated Poisson (ZIP) regression models with mixed effects are useful tools for analyzing such data, in which covariates are usually incorporated in the model to explain inter-subject variation and normal distribution is assumed for both random effects and random errors. However, in many practical applications, such assumptions may be violated as the data often exhibit skewness and some covariates may be measured with measurement errors. In this paper, we deal with these issues simultaneously by developing a Bayesian joint hierarchical modeling approach. Specifically, by treating intercepts and slopes in logistic and Poisson regression as random, a flexible two-level ZIP regression model is proposed, where a covariate process with measurement errors is established and a skew-t-distribution is considered for both random errors and random effects. Under the Bayesian framework, model selection is carried out using deviance information criterion (DIC) and a goodness-of-fit statistics is also developed for assessing the plausibility of the posited model. The main advantage of our method is that it allows for more robustness and correctness for investigating heterogeneity from different levels, while accommodating the skewness and measurement errors simultaneously. An application to Shanghai Youth Fitness Survey is used as an illustrate example. Through this real example, it is showed that our approach is of interest and usefulness for applications.  相似文献   

19.
Count data often display excessive number of zero outcomes than are expected in the Poisson regression model. The zero-inflated Poisson regression model has been suggested to handle zero-inflated data, whereas the zero-inflated negative binomial (ZINB) regression model has been fitted for zero-inflated data with additional overdispersion. For bivariate and zero-inflated cases, several regression models such as the bivariate zero-inflated Poisson (BZIP) and bivariate zero-inflated negative binomial (BZINB) have been considered. This paper introduces several forms of nested BZINB regression model which can be fitted to bivariate and zero-inflated count data. The mean–variance approach is used for comparing the BZIP and our forms of BZINB regression model in this study. A similar approach was also used by past researchers for defining several negative binomial and zero-inflated negative binomial regression models based on the appearance of linear and quadratic terms of the variance function. The nested BZINB regression models proposed in this study have several advantages; the likelihood ratio tests can be performed for choosing the best model, the models have flexible forms of marginal mean–variance relationship, the models can be fitted to bivariate zero-inflated count data with positive or negative correlations, and the models allow additional overdispersion of the two dependent variables.  相似文献   

20.
In certain applications involving discrete data, it is sometimes found that X = 0 is observed with a frequency significantly higher than predicted by the assumed model. Zero inflated Poisson, binomial and negative binomial models have been employed in some clinical trials and in some regression analysis problems.

In this paper, we study the zero inflated modified power series distributions (IMPSD) which include among others the generalized Poisson and the generalized negative binomial distributions and hence the Poisson, binomial and negative binomial distributions. The structural properties along with the distribution of the sum of independent IMPSD variables are studied. The maximum likelihood estimation of the parameters of the model is examined and the variance-covariance matrix of the estimators is obtained. Finally, examples are presented for the generalized Poisson distribution to illustrate the results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号