期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Zero-inflated and overdispersed: what's one to do?

Suzanne E. Perumean-Chaney Charity Morgan David McDowall Inmaculada Aban 《Journal of Statistical Computation and Simulation》2013,83(9):1671-1683

Zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models are recommended for handling excessive zeros in count data. For various reasons, researchers may not address zero inflation. This paper helps educate researchers on (1) the importance of accounting for zero inflation and (2) the consequences of misspecifying the statistical model. Using simulations, we found that when the zero inflation in the data was ignored, estimation was poor and statistically significant findings were missed. When overdispersion within the zero-inflated data was ignored, poor estimation and inflated Type I errors resulted. Recommendations on when to use the ZINB and ZIP models are provided. In an illustration using a two-step model selection procedure (likelihood ratio test and the Vuong test), the ZIP model was correctly identified only when the distributions had moderate means and sample sizes and did not correctly identify the ZINB model or the zero inflation in the ZIP and ZINB distributions. 相似文献

2.

A comparison of count data models with an application to daily cigarette consumption of young persons

Muhammed Fatih Tüzen Semra Erbaş 《统计学通讯:理论与方法》2018,47(23):5825-5844

The objective of this study is providing a comparative assessment for researchers to deal with the challenges of analyzing count data and examining the factors associated with daily cigarette consumption among the young people in Turkey. We fitted Poisson (P), negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH) and negative binomial hurdle (NBH) regressions to cigarette consumption count data by using the 2014 Turkey Health Survey. Our results showed that the ZINB and NBH models should be preferred. We also found that, gender, employment and tobacco use at home are more effective factors for smokers and nonsmokers in the 15–24 age group in Turkey. 相似文献

3.

Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives

Hossein Zamani 《Journal of applied statistics》2013,40(9):2056-2068

In several cases, count data often have excessive number of zero outcomes. This zero-inflated phenomenon is a specific cause of overdispersion, and zero-inflated Poisson regression model (ZIP) has been proposed for accommodating zero-inflated data. However, if the data continue to suggest additional overdispersion, zero-inflated negative binomial (ZINB) and zero-inflated generalized Poisson (ZIGP) regression models have been considered as alternatives. This study proposes the score test for testing ZIP regression model against ZIGP alternatives and proves that it is equal to the score test for testing ZIP regression model against ZINB alternatives. The advantage of using the score test over other alternative tests such as likelihood ratio and Wald is that the score test can be used to determine whether a more complex model is appropriate without fitting the more complex model. Applications of the proposed score test on several datasets are also illustrated. 相似文献

4.

GEE-based zero-inflated generalized Poisson model for clustered over or under-dispersed count data

Fatemeh Sarvi Hossein Mahjub 《Journal of Statistical Computation and Simulation》2019,89(14):2711-2732

The zero-inflated regression models such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) or zero-inflated generalized Poisson (ZIGP) regression models can model the count data with excess zeros. The ZINB model can handle over-dispersed and the ZIGP model can handle the over or under-dispersed count data with excess zeros as well. Moreover, the count data may be correlated because of data collection procedure or special study design. The clustered sampling approach is one of the examples in which the correlation among subjects could be defined. In such situations, a marginal model using generalized estimating equation (GEE) approach can incorporate these correlations and lead up to the relationships at the population level. In this study, the GEE-based zero-inflated generalized Poisson regression model was proposed to fit over and under-dispersed clustered count data with excess zeros. 相似文献

5.

Bayesian estimation and case influence diagnostics for the zero-inflated negative binomial regression model 总被引：1，自引：0，他引：1

Aldo M. Garay Victor H. Lachos Heleno Bolfarine 《Journal of applied statistics》2015,42(6):1148-1165

In recent years, there has been considerable interest in regression models based on zero-inflated distributions. These models are commonly encountered in many disciplines, such as medicine, public health, and environmental sciences, among others. The zero-inflated Poisson (ZIP) model has been typically considered for these types of problems. However, the ZIP model can fail if the non-zero counts are overdispersed in relation to the Poisson distribution, hence the zero-inflated negative binomial (ZINB) model may be more appropriate. In this paper, we present a Bayesian approach for fitting the ZINB regression model. This model considers that an observed zero may come from a point mass distribution at zero or from the negative binomial model. The likelihood function is utilized to compute not only some Bayesian model selection measures, but also to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. The approach can be easily implemented using standard Bayesian software, such as WinBUGS. The performance of the proposed method is evaluated with a simulation study. Further, a real data set is analyzed, where we show that ZINB regression models seems to fit the data better than the Poisson counterpart. 相似文献

6.

Testing overdispersion in the zero-inflated Poisson model

Zhao Yang James W. Hardin Cheryl L. Addy 《Journal of statistical planning and inference》2009

The zero-inflated negative binomial (ZINB) model is used to account for commonly occurring overdispersion detected in data that are initially analyzed under the zero-inflated Poisson (ZIP) model. Tests for overdispersion (Wald test, likelihood ratio test [LRT], and score test) based on ZINB model for use in ZIP regression models have been developed. Due to similarity to the ZINB model, we consider the zero-inflated generalized Poisson (ZIGP) model as an alternate model for overdispersed zero-inflated count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes score tests for overdispersion based on the ZIGP model and illustrates that the derived score statistics are exactly the same as the score statistics under the ZINB model. A simulation study indicates the proposed score statistics are preferred to other tests for higher empirical power. In practice, based on the approximate mean–variance relationship in the data, the ZINB or ZIGP model can be considered, and a formal score test based on asymptotic standard normal distribution can be employed for assessing overdispersion in the ZIP model. We provide an example to illustrate the procedures for data analysis. 相似文献

7.

The VGAM package for negative binomial regression

Thomas W. Yee 《Australian & New Zealand Journal of Statistics》2020,62(1):116-131

Negative binomial (NB) regression is the most common full‐likelihood method for analysing count data exhibiting overdispersion with respect to the Poisson distribution. Usually most practitioners are content to fit one of two NB variants, however other important variants exist. It is demonstrated here that the VGAM R package can fit them all under a common statistical framework founded upon a generalised linear and additive model approach. Additionally, other modifications such as zero‐altered (hurdle), zero‐truncated and zero‐inflated NB distributions are naturally handled. Rootograms are also available for graphically checking the goodness of fit. Two data sets and some recently added features of the VGAM package are used here for illustration. 相似文献

8.

Model selection criteria for dual-inflated data

Ting Hsiang Lin Min-Hsiao Tsai 《Journal of Statistical Computation and Simulation》2016,86(13):2663-2672

ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size. 相似文献

9.

Untangle the structural and random zeros in statistical modelings

W. Tang W.J. Wang D.G. Chen 《Journal of applied statistics》2018,45(9):1714-1733

Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method. 相似文献

10.

Zero-inflated models and estimation in zero-inflated Poisson distribution

Yogita S. Wagh 《统计学通讯:模拟与计算》2018,47(8):2248-2265

In this paper, we briefly overview different zero-inflated probability distributions. We compare the performance of the estimates of Poisson, Generalized Poisson, ZIP, ZIGP and ZINB models through Mean square error (MSE), bias and Standard error (SE) when the samples are generated from ZIP distribution. We propose a new estimator referred to as probability estimator (PE) of inflation parameter of ZIP distribution based on moment estimator (ME) of the mean parameter and compare its performance with ME and maximum likelihood estimator (MLE) through a simulation study. We use the PE along with ME and MLE to fit ZIP distribution to various zero-inflated datasets and observe that the results do not differ significantly. We recommend using PE in place of MLE since it is easy to calculate and the simulation study in this paper demonstrates that the PE performs as good as MLE irrespective of the sample size. 相似文献

11.

Zero-inflated sum of Conway-Maxwell-Poissons (ZISCMP) regression

Kimberly F. Sellers Derek S. Young 《Journal of Statistical Computation and Simulation》2019,89(9):1649-1673

While excess zeros are often thought to cause data over-dispersion (i.e. when the variance exceeds the mean), this implication is not absolute. One should instead consider a flexible class of distributions that can address data dispersion along with excess zeros. This work develops a zero-inflated sum-of-Conway-Maxwell-Poissons (ZISCMP) regression as a flexible analysis tool to model count data that express significant data dispersion and contain excess zeros. This class of models contains several special case zero-inflated regressions, including zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-inflated binomial (ZIB), and the zero-inflated Conway-Maxwell-Poisson (ZICMP). Through simulated and real data examples, we demonstrate class flexibility and usefulness. We further utilize it to analyze shark species data from Australia's Great Barrier Reef to assess the environmental impact of human action on the number of various species of sharks. 相似文献

12.

Inferences for the inflation parameter in the zip distributions: The method of moments

Mahmood Kharrati-Kopaei Havva Faghih 《Statistical Methodology》2011,8(4):377-388

The zero-inflated Poisson (ZIP) distribution is widely used for modeling a count data set when the frequency of zeros is higher than the one expected under the Poisson distribution. There are many methods for making inferences for the inflation parameter in the ZIP models, e.g. the methods for testing Poisson (the inflation parameter is zero) versus ZIP distribution (the inflation parameter is positive). Most of these methods are based on the maximum likelihood estimators which do not have an explicit expression. However, the estimators which are obtained by the method of moments are powerful enough, easy to obtain and implement. In this paper, we propose an approach based on the method of moments for making inferences about the inflation parameter in the ZIP distribution. Our method is also compared to some recent methods via a simulation study and it is illustrated by an example. 相似文献

13.

Score tests for heterogeneity and overdispersion in zero‐inflated Poisson and binomial regression models

Daniel B. Hall Kenneth S. Berenhaut 《Revue canadienne de statistique》2002,30(3):415-430

Hall (2000) has described zero‐inflated Poisson and binomial regression models that include random effects to account for excess zeros and additional sources of heterogeneity in the data. The authors of the present paper propose a general score test for the null hypothesis that variance components associated with these random effects are zero. For a zero‐inflated Poisson model with random intercept, the new test reduces to an alternative to the overdispersion test of Ridout, Demério & Hinde (2001). The authors also examine their general test in the special case of the zero‐inflated binomial model with random intercept and propose an overdispersion test in that context which is based on a beta‐binomial alternative. 相似文献

14.

Some Theoretical Comparisons of Negative Binomial and Zero-Inflated Poisson Distributions

Changyong Feng Hongyue Wang Yu Han Yinglin Xia Naiji Lu Xin M. Tu 《统计学通讯:理论与方法》2013,42(15):3266-3277

In this article, we compare the zero-inflated Poisson (ZIP) and negative binomial (NB) distributions based on three most important criteria: the probability of zero, the mean value, and the variance. Our results show that with same mean value and variance, the ZIP distribution always has a larger probability of zeros; with same mean value and probability of zeros, the NB distribution always has a larger variance; and with same variance and probability of zeros, the ZIP distribution always has a larger mean value. We also study the properties of Vuong test in model selection in three cases by simulations. 相似文献

15.

Functional Form for the Zero-Inflated Generalized Poisson Regression Model

Hossein Zamani 《统计学通讯:理论与方法》2014,43(3):515-529

The generalized Poisson (GP) regression is an increasingly popular approach for modeling overdispersed as well as underdispersed count data. Several parameterizations have been performed for the GP regression, and the two well known models, the GP-1 and the GP-2, have been applied. The GP-P regression, which has been recently proposed, has the advantage of nesting the GP-1 and the GP-2 parametrically, besides allowing the statistical tests of the GP-1 and the GP-2 against a more general alternative. In several cases, count data often have excessive number of zero outcomes than are expected in the Poisson. This zero-inflation phenomenon is a specific cause of overdispersion, and the zero-inflated Poisson (ZIP) regression model has been proposed. However, if the data continue to suggest additional overdispersion, the zero-inflated negative binomial (ZINB-1 and ZINB-2) and the zero-inflated generalized Poisson (ZIGP-1 and ZIGP-2) regression models have been considered as alternatives. This article proposes a functional form of the ZIGP which mixes a distribution degenerate at zero with a GP-P distribution. The suggested model has the advantage of nesting the ZIP and the two well known ZIGP (ZIGP-1 and ZIGP-2) regression models, besides allowing the statistical tests of the ZIGP-1 and the ZIGP-2 against a more general alternative. The ZIP and the functional form of the ZIGP regression models are fitted, compared and tested on two sets of count data; the Malaysian insurance claim data and the German healthcare data. 相似文献

16.

On the Correlation Structure of Gaussian Copula Models for Geostatistical Count Data

下载免费PDF全文

Zifei Han Victor De Oliveira 《Australian & New Zealand Journal of Statistics》2016,58(1):47-69

We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion. 相似文献

17.

A generalized Bayesian nonlinear mixed‐effects regression model for zero‐inflated longitudinal count data in tuberculosis trials

Divan Aristo Burger Robert Schall Rianne Jacobs Ding‐Geng Chen 《Pharmaceutical statistics》2019,18(4):420-432

In this paper, we investigate Bayesian generalized nonlinear mixed‐effects (NLME) regression models for zero‐inflated longitudinal count data. The methodology is motivated by and applied to colony forming unit (CFU) counts in extended bactericidal activity tuberculosis (TB) trials. Furthermore, for model comparisons, we present a generalized method for calculating the marginal likelihoods required to determine Bayes factors. A simulation study shows that the proposed zero‐inflated negative binomial regression model has good accuracy, precision, and credibility interval coverage. In contrast, conventional normal NLME regression models applied to log‐transformed count data, which handle zero counts as left censored values, may yield credibility intervals that undercover the true bactericidal activity of anti‐TB drugs. We therefore recommend that zero‐inflated NLME regression models should be fitted to CFU count on the original scale, as an alternative to conventional normal NLME regression models on the logarithmic scale. 相似文献

18.

Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros

Abbas Moghimbeigi Kazem Mohammad Brian Mcardle 《Journal of applied statistics》2008,35(10):1193-1202

Count data with excess zeros often occurs in areas such as public health, epidemiology, psychology, sociology, engineering, and agriculture. Zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression are useful for modeling such data, but because of hierarchical study design or the data collection procedure, zero-inflation and correlation may occur simultaneously. To overcome these challenges ZIP or ZINB may still be used. In this paper, multilevel ZINB regression is used to overcome these problems. The method of parameter estimation is an expectation-maximization algorithm in conjunction with the penalized likelihood and restricted maximum likelihood estimates for variance components. Alternative modeling strategies, namely the ZIP distribution are also considered. An application of the proposed model is shown on decayed, missing, and filled teeth of children aged 12 years old. 相似文献

19.

Bayesian analysis of zero-inflated regression models

《Journal of statistical planning and inference》2006,136(4):1360-1375

In modeling defect counts collected from an established manufacturing processes, there are usually a relatively large number of zeros (non-defects). The commonly used models such as Poisson or Geometric distributions can underestimate the zero-defect probability and hence make it difficult to identify significant covariate effects to improve production quality. This article introduces a flexible class of zero inflated models which includes other familiar models such as the Zero Inflated Poisson (ZIP) models, as special cases. A Bayesian estimation method is developed as an alternative to traditionally used maximum likelihood based methods to analyze such data. Simulation studies show that the proposed method has better finite sample performance than the classical method with tighter interval estimates and better coverage probabilities. A real-life data set is analyzed to illustrate the practicability of the proposed method easily implemented using WinBUGS. 相似文献

20.

Bivariate zero-inflated negative binomial regression model with applications

Pouya Faroughi 《Journal of Statistical Computation and Simulation》2017,87(3):457-477

Count data often display excessive number of zero outcomes than are expected in the Poisson regression model. The zero-inflated Poisson regression model has been suggested to handle zero-inflated data, whereas the zero-inflated negative binomial (ZINB) regression model has been fitted for zero-inflated data with additional overdispersion. For bivariate and zero-inflated cases, several regression models such as the bivariate zero-inflated Poisson (BZIP) and bivariate zero-inflated negative binomial (BZINB) have been considered. This paper introduces several forms of nested BZINB regression model which can be fitted to bivariate and zero-inflated count data. The mean–variance approach is used for comparing the BZIP and our forms of BZINB regression model in this study. A similar approach was also used by past researchers for defining several negative binomial and zero-inflated negative binomial regression models based on the appearance of linear and quadratic terms of the variance function. The nested BZINB regression models proposed in this study have several advantages; the likelihood ratio tests can be performed for choosing the best model, the models have flexible forms of marginal mean–variance relationship, the models can be fitted to bivariate zero-inflated count data with positive or negative correlations, and the models allow additional overdispersion of the two dependent variables. 相似文献