期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Marginal zero-inflated regression models for count data

Jacob Martin Daniel B. Hall 《Journal of applied statistics》2017,44(10):1807-1826

Data sets with excess zeroes are frequently analyzed in many disciplines. A common framework used to analyze such data is the zero-inflated (ZI) regression model. It mixes a degenerate distribution with point mass at zero with a non-degenerate distribution. The estimates from ZI models quantify the effects of covariates on the means of latent random variables, which are often not the quantities of primary interest. Recently, marginal zero-inflated Poisson (MZIP; Long et al. [A marginalized zero-inflated Poisson regression model with overall exposure effects. Stat. Med. 33 (2014), pp. 5151–5165]) and negative binomial (MZINB; Preisser et al., 2016) models have been introduced that model the mean response directly. These models yield covariate effects that have simple interpretations that are, for many applications, more appealing than those available from ZI regression. This paper outlines a general framework for marginal zero-inflated models where the latent distribution is a member of the exponential dispersion family, focusing on common distributions for count data. In particular, our discussion includes the marginal zero-inflated binomial (MZIB) model, which has not been discussed previously. The details of maximum likelihood estimation via the EM algorithm are presented and the properties of the estimators as well as Wald and likelihood ratio-based inference are examined via simulation. Two examples presented illustrate the advantages of MZIP, MZINB, and MZIB models for practical data analysis. 相似文献

2.

Untangle the structural and random zeros in statistical modelings

W. Tang W.J. Wang D.G. Chen 《Journal of applied statistics》2018,45(9):1714-1733

Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method. 相似文献

3.

Bivariate zero-inflated negative binomial regression model with applications

Pouya Faroughi 《Journal of Statistical Computation and Simulation》2017,87(3):457-477

Count data often display excessive number of zero outcomes than are expected in the Poisson regression model. The zero-inflated Poisson regression model has been suggested to handle zero-inflated data, whereas the zero-inflated negative binomial (ZINB) regression model has been fitted for zero-inflated data with additional overdispersion. For bivariate and zero-inflated cases, several regression models such as the bivariate zero-inflated Poisson (BZIP) and bivariate zero-inflated negative binomial (BZINB) have been considered. This paper introduces several forms of nested BZINB regression model which can be fitted to bivariate and zero-inflated count data. The mean–variance approach is used for comparing the BZIP and our forms of BZINB regression model in this study. A similar approach was also used by past researchers for defining several negative binomial and zero-inflated negative binomial regression models based on the appearance of linear and quadratic terms of the variance function. The nested BZINB regression models proposed in this study have several advantages; the likelihood ratio tests can be performed for choosing the best model, the models have flexible forms of marginal mean–variance relationship, the models can be fitted to bivariate zero-inflated count data with positive or negative correlations, and the models allow additional overdispersion of the two dependent variables. 相似文献

4.

GEE-based zero-inflated generalized Poisson model for clustered over or under-dispersed count data

Fatemeh Sarvi Hossein Mahjub 《Journal of Statistical Computation and Simulation》2019,89(14):2711-2732

The zero-inflated regression models such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) or zero-inflated generalized Poisson (ZIGP) regression models can model the count data with excess zeros. The ZINB model can handle over-dispersed and the ZIGP model can handle the over or under-dispersed count data with excess zeros as well. Moreover, the count data may be correlated because of data collection procedure or special study design. The clustered sampling approach is one of the examples in which the correlation among subjects could be defined. In such situations, a marginal model using generalized estimating equation (GEE) approach can incorporate these correlations and lead up to the relationships at the population level. In this study, the GEE-based zero-inflated generalized Poisson regression model was proposed to fit over and under-dispersed clustered count data with excess zeros. 相似文献

5.

A Bayesian Approach for Zero-Inflated Count Regression Models by Using the Reversible Jump Markov Chain Monte Carlo Method and an Application

İlknur Özmen 《统计学通讯:理论与方法》2013,42(12):2109-2127

In this study, estimation of the parameters of the zero-inflated count regression models and computations of posterior model probabilities of the log-linear models defined for each zero-inflated count regression models are investigated from the Bayesian point of view. In addition, determinations of the most suitable log-linear and regression models are investigated. It is known that zero-inflated count regression models cover zero-inflated Poisson, zero-inflated negative binomial, and zero-inflated generalized Poisson regression models. The classical approach has some problematic points but the Bayesian approach does not have similar flaws. This work points out the reasons for using the Bayesian approach. It also lists advantages and disadvantages of the classical and Bayesian approaches. As an application, a zoological data set, including structural and sampling zeros, is used in the presence of extra zeros. In this work, it is observed that fitting a zero-inflated negative binomial regression model creates no problems at all, even though it is known that fitting a zero-inflated negative binomial regression model is the most problematic procedure in the classical approach. Additionally, it is found that the best fitting model is the log-linear model under the negative binomial regression model, which does not include three-way interactions of factors. 相似文献

6.

Bivariate zero-inflated generalized Poisson regression model with flexible covariance

Pouya Faroughi 《统计学通讯:理论与方法》2017,46(15):7769-7785

This paper introduces several forms of nested bivariate zero-inflated generalized Poisson (BZIGP) regression model which can be fitted to bivariate and zero-inflated count data. The main advantage of having several forms of BZIGP regression model is that they are nested and allow likelihood ratio test to be performed for choosing the best model. In addition, the BZIGP regression models have flexible forms of marginal mean–variance relationship, can be fitted to bivariate and zero-inflated count data with positive or negative correlations, and allow additional overdispersion of the two response variables. The BZIGP regression models are fitted to the Australian Health Survey data. 相似文献

7.

A doubly-inflated Poisson regression for correlated count data

Erfan Ghasemi Alireza Akbarzadeh Baghban Farid Zayeri Asma Pourhoseingholi Seyed Mohammadreza Safavi 《Journal of applied statistics》2021,48(6):1111

Count data have emerged in many applied research areas. In recent years, there has been a considerable interest in models for count data. In modelling such data, it is common to face a large frequency of zeroes. The data are regarded as zero-inflated when the frequency of observed zeroes is larger than what is expected from a theoretical distribution such as Poisson distribution, as a standard model for analysing count data. Data analysis, using the simple Poisson model, may lead to over-dispersion. Several classes of different mixture models were proposed for handling zero-inflated data. But they do not apply to cases when inflated counts happen at some other points, in addition to zero. In these cases, a doubly-inflated Poisson model has been suggested which only be used for cross-sectional data and cannot consider correlations between observations. However, correlated count data have a large application, especially in the health and medical fields. The present study aims to introduce a Doubly-Inflated Poisson models with random effect for correlated doubly-inflated data. Then, the best performance of the proposed method is shown via different simulation scenarios. Finally, the proposed model is applied to a dental study.KEYWORDS: Count data, doubly-inflated, Poisson regression, zero-inflated, correlated data 相似文献

8.

Marginalized zero-inflated generalized Poisson regression

Felix Famoye John S. Preisser 《Journal of applied statistics》2018,45(7):1247-1259

The generalized Poisson (GP) regression model has been used to model count data that exhibit over-dispersion or under-dispersion. The zero-inflated GP (ZIGP) regression model can additionally handle count data characterized by many zeros. However, the parameters of ZIGP model cannot easily be used for inference on overall exposure effects. In order to address this problem, a marginalized ZIGP is proposed to directly model the population marginal mean count. The parameters of the marginalized zero-inflated GP model are estimated by the method of maximum likelihood. The regression model is illustrated by three real-life data sets. 相似文献

9.

Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives

Hossein Zamani 《Journal of applied statistics》2013,40(9):2056-2068

In several cases, count data often have excessive number of zero outcomes. This zero-inflated phenomenon is a specific cause of overdispersion, and zero-inflated Poisson regression model (ZIP) has been proposed for accommodating zero-inflated data. However, if the data continue to suggest additional overdispersion, zero-inflated negative binomial (ZINB) and zero-inflated generalized Poisson (ZIGP) regression models have been considered as alternatives. This study proposes the score test for testing ZIP regression model against ZIGP alternatives and proves that it is equal to the score test for testing ZIP regression model against ZINB alternatives. The advantage of using the score test over other alternative tests such as likelihood ratio and Wald is that the score test can be used to determine whether a more complex model is appropriate without fitting the more complex model. Applications of the proposed score test on several datasets are also illustrated. 相似文献

10.

Quasi-binomial zero-inflated regression model suitable for variables with bounded support

E. Gmez&#x;Dniz D. I. Gallardo H. W. Gmez 《Journal of applied statistics》2020,47(12):2208

In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect to exogenous covariates. Apart from the classical Poisson, negative binomial and generalised Poisson distributions, many proposals have appeared in the statistical literature, perhaps in response to the new possibilities offered by advanced software that now enables researchers to implement numerous special functions in a relatively simple way. However, we believe that a significant research gap remains, since very little attention has been paid to the quasi-binomial distribution, which was first proposed over fifty years ago. We believe this distribution might constitute a valid alternative to existing regression models, in situations in which the variable has bounded support. Therefore, in this paper we present a zero-inflated regression model based on the quasi-binomial distribution, taking into account the moments and maximum likelihood estimators, and perform a score test to compare the zero-inflated quasi-binomial distribution with the zero-inflated binomial distribution, and the zero-inflated model with the homogeneous model (the model in which covariates are not considered). This analysis is illustrated with two data sets that are well known in the statistical literature and which contain a large number of zeros. 相似文献

11.

Modeling road traffic crashes with zero-inflation and site-specific random effects

Helai Huang Hong Chor Chin 《Statistical Methods and Applications》2010,19(3):445-462

Zero-inflated count models are increasingly employed in many fields in case of “zero-inflation”. In modeling road traffic crashes, it has also shown to be useful in obtaining a better model-fitting when zero crash counts are over-presented. However, the general specification of zero-inflated model can not account for the multilevel data structure in crash data, which may be an important source of over-dispersion. This paper examines zero-inflated Poisson regression with site-specific random effects (REZIP) with comparison to random effect Poisson model and standard zero-inflated poison model. A practical and flexible procedure, using Bayesian inference with Markov Chain Monte Carlo algorithm and cross-validation predictive density techniques, is applied for model calibration and suitability assessment. Using crash data in Singapore (1998–2005), the illustrative results demonstrate that the REZIP model may significantly improve the model-fitting and predictive performance of crash prediction models. This improvement can contribute to traffic safety management and engineering practices such as countermeasure design and safety evaluation of traffic treatments. 相似文献

12.

Regression analysis of zero-inflated time-series counts: application to air pollution related emergency room visit data

M. Tariqul Hasan Gary Sneddon Renjun Ma 《Journal of applied statistics》2012,39(3):467-476

Time-series count data with excessive zeros frequently occur in environmental, medical and biological studies. These data have been traditionally handled by conditional and marginal modeling approaches separately in the literature. The conditional modeling approaches are computationally much simpler, whereas marginal modeling approaches can link the overall mean with covariates directly. In this paper, we propose new models that can have conditional and marginal modeling interpretations for zero-inflated time-series counts using compound Poisson distributed random effects. We also develop a computationally efficient estimation method for our models using a quasi-likelihood approach. The proposed method is illustrated with an application to air pollution-related emergency room visits. We also evaluate the performance of our method through simulation studies. 相似文献

13.

Random effect exponentiated-exponential geometric model for clustered/longitudinal zero-inflated count data

Leili Tapak Omid Hamidi Payam Amini Geert Verbeke 《Journal of applied statistics》2020,47(12):2272

For count responses, there are situations in biomedical and sociological applications in which extra zeroes occur. Modeling correlated (e.g. repeated measures and clustered) zero-inflated count data includes special challenges because the correlation between measurements for a subject or a cluster needs to be taken into account. Moreover, zero-inflated count data are often faced with over/under dispersion problem. In this paper, we propose a random effect model for repeated measurements or clustered data with over/under dispersed response called random effect zero-inflated exponentiated-exponential geometric regression model. The proposed method was illustrated through real examples. The performance of the model and asymptotical properties of the estimations were investigated using simulation studies.KEYWORDS: Count model, under- and over-dispersion, zero-inflation, mixture model, zero-inflated poisson model 相似文献

14.

Testing overdispersion in the zero-inflated Poisson model

Zhao Yang James W. Hardin Cheryl L. Addy 《Journal of statistical planning and inference》2009

The zero-inflated negative binomial (ZINB) model is used to account for commonly occurring overdispersion detected in data that are initially analyzed under the zero-inflated Poisson (ZIP) model. Tests for overdispersion (Wald test, likelihood ratio test [LRT], and score test) based on ZINB model for use in ZIP regression models have been developed. Due to similarity to the ZINB model, we consider the zero-inflated generalized Poisson (ZIGP) model as an alternate model for overdispersed zero-inflated count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes score tests for overdispersion based on the ZIGP model and illustrates that the derived score statistics are exactly the same as the score statistics under the ZINB model. A simulation study indicates the proposed score statistics are preferred to other tests for higher empirical power. In practice, based on the approximate mean–variance relationship in the data, the ZINB or ZIGP model can be considered, and a formal score test based on asymptotic standard normal distribution can be employed for assessing overdispersion in the ZIP model. We provide an example to illustrate the procedures for data analysis. 相似文献

15.

Functional Form for the Zero-Inflated Generalized Poisson Regression Model

Hossein Zamani 《统计学通讯:理论与方法》2014,43(3):515-529

The generalized Poisson (GP) regression is an increasingly popular approach for modeling overdispersed as well as underdispersed count data. Several parameterizations have been performed for the GP regression, and the two well known models, the GP-1 and the GP-2, have been applied. The GP-P regression, which has been recently proposed, has the advantage of nesting the GP-1 and the GP-2 parametrically, besides allowing the statistical tests of the GP-1 and the GP-2 against a more general alternative. In several cases, count data often have excessive number of zero outcomes than are expected in the Poisson. This zero-inflation phenomenon is a specific cause of overdispersion, and the zero-inflated Poisson (ZIP) regression model has been proposed. However, if the data continue to suggest additional overdispersion, the zero-inflated negative binomial (ZINB-1 and ZINB-2) and the zero-inflated generalized Poisson (ZIGP-1 and ZIGP-2) regression models have been considered as alternatives. This article proposes a functional form of the ZIGP which mixes a distribution degenerate at zero with a GP-P distribution. The suggested model has the advantage of nesting the ZIP and the two well known ZIGP (ZIGP-1 and ZIGP-2) regression models, besides allowing the statistical tests of the ZIGP-1 and the ZIGP-2 against a more general alternative. The ZIP and the functional form of the ZIGP regression models are fitted, compared and tested on two sets of count data; the Malaysian insurance claim data and the German healthcare data. 相似文献

16.

Distribution-free inference of zero-inflated binomial data for longitudinal studies

H. He W. Wang R. Gallop P. Crits-Christoph Y. Xia 《Journal of applied statistics》2015,42(10):2203-2219

Count responses with structural zeros are very common in medical and psychosocial research, especially in alcohol and HIV research, and the zero-inflated Poisson (ZIP) and zero-inflated negative binomial models are widely used for modeling such outcomes. However, as alcohol drinking outcomes such as days of drinkings are counts within a given period, their distributions are bounded above by an upper limit (total days in the period) and thus inherently follow a binomial or zero-inflated binomial (ZIB) distribution, rather than a Poisson or ZIP distribution, in the presence of structural zeros. In this paper, we develop a new semiparametric approach for modeling ZIB-like count responses for cross-sectional as well as longitudinal data. We illustrate this approach with both simulated and real study data. 相似文献

17.

Zero-inflated models for adjusting varying exposures: a cautionary note on the pitfalls of using offset

Cindy Feng 《Journal of applied statistics》2022,49(1):1

Zero-inflated count data are frequently encountered in public health and epidemiology research. Two-parts model is often used to model the excessive zeros, which are a mixture of two components: a point mass at zero and a count distribution, such as a Poisson distribution. When the rate of events per unit exposure is of interest, offset is commonly used to account for the varying extent of exposure, which is essentially a predictor whose regression coefficient is fixed at one. Such an assumption of exposure effect is, however, quite restrictive for many practical problems. Further, for zero-inflated models, offset is often only included in the count component of the model. However, the probability of excessive zero component could also be affected by the amount of ‘exposure’. We, therefore, proposed incorporating the varying exposure as a covariate rather than an offset term in both the probability of excessive zeros and conditional counts components of the zero-inflated model. A real example is used to illustrate the usage of the proposed methods, and simulation studies are conducted to assess the performance of the proposed methods for a broad variety of situations. 相似文献

18.

零膨胀计数数据的联合建模及变量选择

胡亚南田茂再《统计研究》2019,36(1):104-114

零膨胀计数数据破坏了泊松分布的方差-均值关系，可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术，研究了零膨胀计数数据的联合建模及变量选择问题.对于零膨胀泊松分布，引入潜变量，构造出零膨胀泊松模型的完全似然, 其中由零膨胀部分和泊松部分两项组成.考虑到协变量可能存在共线性和稀疏性，通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量，并用贝叶斯信息准则BIC来确定最优调节参数.本文也给出了估计量的大样本性质的理论证明和模拟研究，最后把所提出的方法应用到实际问题中。相似文献

19.

Zero-inflated Poisson and negative binomial integer-valued GARCH models

Fukang Zhu 《Journal of statistical planning and inference》2012,142(4):826-839

Zero inflation means that the proportion of 0's of a model is greater than the proportion of 0's of the corresponding Poisson model, which is a common phenomenon in count data. To model the zero-inflated characteristic of time series of counts, we propose zero-inflated Poisson and negative binomial INGARCH models, which are useful and flexible generalizations of the Poisson and negative binomial INGARCH models, respectively. The stationarity conditions and the autocorrelation function are given. Based on the EM algorithm, the estimating procedure is simple and easy to be implemented. A simulation study shows that the estimation method is accurate and reliable as long as the sample size is reasonably large. A real data example leads to superior performance of the proposed models compared with other competitive models in the literature. 相似文献

20.

A Comparative Study of Observation- and Parameter-driven Zero-inflated Poisson Models for Longitudinal Count Data

M. Tariqul Hasan Gary Sneddon 《统计学通讯:模拟与计算》2016,45(10):3643-3659

Longitudinal count data with excessive zeros frequently occur in social, biological, medical, and health research. To model such data, zero-inflated Poisson (ZIP) models are commonly used, after separating zero and positive responses. As longitudinal count responses are likely to be serially correlated, such separation may destroy the underlying serial correlation structure. To overcome this problem recently observation- and parameter-driven modelling approaches have been proposed. In the observation-driven model, the response at a specific time point is modelled through the responses at previous time points after incorporating serial correlation. One limitation of the observation-driven model is that it fails to accommodate the presence of any possible over-dispersion, which frequently occurs in the count responses. This limitation is overcome in a parameter-driven model, where the serial correlation is captured through the latent process using random effects. We compare the results obtained by the two models. A quasi-likelihood approach has been developed to estimate the model parameters. The methodology is illustrated with analysis of two real life datasets. To examine model performance the models are also compared through a simulation study. 相似文献