首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We extend the family of Poisson and negative binomial models to derive the joint distribution of clustered count outcomes with extra zeros. Two random effects models are formulated. The first model assumes a shared random effects term between the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxes the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two different but correlated random effects variables. Under the conditional independence and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm are proposed to fit the proposed models. Our proposed models are fitted to dental caries counts of children under the age of six in the city of Detroit.  相似文献   

2.
In this study, estimation of the parameters of the zero-inflated count regression models and computations of posterior model probabilities of the log-linear models defined for each zero-inflated count regression models are investigated from the Bayesian point of view. In addition, determinations of the most suitable log-linear and regression models are investigated. It is known that zero-inflated count regression models cover zero-inflated Poisson, zero-inflated negative binomial, and zero-inflated generalized Poisson regression models. The classical approach has some problematic points but the Bayesian approach does not have similar flaws. This work points out the reasons for using the Bayesian approach. It also lists advantages and disadvantages of the classical and Bayesian approaches. As an application, a zoological data set, including structural and sampling zeros, is used in the presence of extra zeros. In this work, it is observed that fitting a zero-inflated negative binomial regression model creates no problems at all, even though it is known that fitting a zero-inflated negative binomial regression model is the most problematic procedure in the classical approach. Additionally, it is found that the best fitting model is the log-linear model under the negative binomial regression model, which does not include three-way interactions of factors.  相似文献   

3.
Generalized endpoint-inflated binomial regression was recently proposed to model count data with large frequencies of both zeros and right-endpoints. Maximum likelihood estimation (MLE) was developed for this model and simulations suggest that the resulting estimates behave well. However, large-sample properties of the MLE have not yet been rigorously established. Such results are however essential for ensuring reliable statistical inference and decision-making. This paper addresses this issue. Identifiability of the generalized endpoint-inflated binomial regression model is first proved. Then, consistency and asymptotic normality of the MLE are established. A simulation study is conducted to assess finite-sample behaviour of the estimator.  相似文献   

4.
In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome.  相似文献   

5.
The logistic regression model has become a standard tool to investigate the relationship between a binary outcome and a set of potential predictors. When analyzing binary data, it often arises that the observed proportion of zeros is greater than expected under the postulated logistic model. Zero-inflated binomial (ZIB) models have been developed to fit binary data that contain too many zeros. Maximum likelihood estimators in these models have been proposed and their asymptotic properties established. Several aspects of ZIB models still deserve attention however, such as the estimation of odds-ratios and event probabilities. In this article, we propose estimators of these quantities and we investigate their properties both theoretically and via simulations. Based on these results, we provide recommendations about the range of conditions (minimum sample size, maximum proportion of zeros in excess) under which a reliable statistical inference on the odds-ratios and event probabilities can be obtained in a ZIB regression model. A real-data example illustrates the proposed estimators.  相似文献   

6.
For the modeling of bounded counts, the binomial distribution is a common choice. In applications, however, one often observes an excessive number of zeros and extra-binomial variation, which cannot be explained by a binomial distribution. We propose statistics to evaluate the number of zeros and the dispersion with respect to a binomial model, which is based on the sample binomial index of dispersion and the sample binomial zero index. We apply this index to autocorrelated counts generated by a binomial autoregressive process of order one, which also includes the special case of independent and identically (i. i. d.) bounded counts. The limiting null distributions of the proposed test statistics are derived. A Monte-Carlo study evaluates their size and power under various alternatives. Finally, we present two real-data applications as well as the derivation of effective sample sizes to illustrate the proposed methodology.  相似文献   

7.
The negative binomial (NB)-mixed regression in many situations is more appropriate for analysing the correlated and over-dispersed count data. In this paper, a score test for assessing extra zeros against the NB-mixed regression in the correlated count data with excess zeros is developed. The sampling distribution and power of the score test statistic is evaluated using a simulation study. The results show that under a wide range of conditions, the score statistic performs satisfactorily. Finally, the use of the score test is illustrated on DMFT index data of children aged 12 years old.  相似文献   

8.
Zero-inflated data are more frequent when the data represent counts. However, there are practical situations in which continuous data contain an excess of zeros. In these cases, the zero-inflated Poisson, binomial or negative binomial models are not suitable. In order to reduce this gap, we propose the zero-spiked gamma-Weibull (ZSGW) model by mixing a distribution which is degenerate at zero with the gamma-Weibull distribution, which has positive support. The model attempts to estimate simultaneously the effects of explanatory variables on the response variable and the zero-spiked. We consider a frequentist analysis and a non-parametric bootstrap for estimating the parameters of the ZSGW regression model. We derive the appropriate matrices for assessing local influence on the model parameters. We illustrate the performance of the proposed regression model by means of a real data set (copaiba oil resin production) from a study carried out at the Department of Forest Science of the Luiz de Queiroz School of Agriculture, University of São Paulo. Based on the ZSGW regression model, we determine the explanatory variables that can influence the excess of zeros of the resin oil production and identify influential observations. We also prove empirically that the proposed regression model can be superior to the zero-adjusted inverse Gaussian regression model to fit zero-inflated positive continuous data.  相似文献   

9.
We propose a bivariate hurdle negative binomial (BHNB) regression model with right censoring to model correlated bivariate count data with excess zeros and few extreme observations. The parameters of the BHNB regression model are obtained using maximum likelihood with conjugate gradient optimization. The proposed model is applied to actual survey data where the bivariate outcome is number of days missed from primary activities and number of days spent in bed due to illness during the 4-week period preceding the inquiry date. We compared the right censored BHNB model to the right censored bivariate negative binomial (BNB) model. A simulation study is conducted to discuss some properties of the BHNB model. Our proposed model demonstrated superior performance in goodness-of-fit of estimated frequencies.KEYWORDS: Zero inflation, over-dispersion, parameter estimation, model selection, right censoring  相似文献   

10.
The zero-inflated regression models such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) or zero-inflated generalized Poisson (ZIGP) regression models can model the count data with excess zeros. The ZINB model can handle over-dispersed and the ZIGP model can handle the over or under-dispersed count data with excess zeros as well. Moreover, the count data may be correlated because of data collection procedure or special study design. The clustered sampling approach is one of the examples in which the correlation among subjects could be defined. In such situations, a marginal model using generalized estimating equation (GEE) approach can incorporate these correlations and lead up to the relationships at the population level. In this study, the GEE-based zero-inflated generalized Poisson regression model was proposed to fit over and under-dispersed clustered count data with excess zeros.  相似文献   

11.
While excess zeros are often thought to cause data over-dispersion (i.e. when the variance exceeds the mean), this implication is not absolute. One should instead consider a flexible class of distributions that can address data dispersion along with excess zeros. This work develops a zero-inflated sum-of-Conway-Maxwell-Poissons (ZISCMP) regression as a flexible analysis tool to model count data that express significant data dispersion and contain excess zeros. This class of models contains several special case zero-inflated regressions, including zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-inflated binomial (ZIB), and the zero-inflated Conway-Maxwell-Poisson (ZICMP). Through simulated and real data examples, we demonstrate class flexibility and usefulness. We further utilize it to analyze shark species data from Australia's Great Barrier Reef to assess the environmental impact of human action on the number of various species of sharks.  相似文献   

12.
Count responses with structural zeros are very common in medical and psychosocial research, especially in alcohol and HIV research, and the zero-inflated Poisson (ZIP) and zero-inflated negative binomial models are widely used for modeling such outcomes. However, as alcohol drinking outcomes such as days of drinkings are counts within a given period, their distributions are bounded above by an upper limit (total days in the period) and thus inherently follow a binomial or zero-inflated binomial (ZIB) distribution, rather than a Poisson or ZIP distribution, in the presence of structural zeros. In this paper, we develop a new semiparametric approach for modeling ZIB-like count responses for cross-sectional as well as longitudinal data. We illustrate this approach with both simulated and real study data.  相似文献   

13.
In cancer studies that use transgenic or knockout mice, skin tumour counts are recorded over time to measure tumorigenicity. In these studies cancer biologists are interested in the effect of endogenous and/or exogenous factors on papilloma onset, multiplicity and regression. In this paper an analysis of data from a study conducted by the National Institute of Environmental Health Sciences on the effect of genetic factors on skin tumorigenesis is presented. Papilloma multiplicity and regression are modelled by using Bernoulli, Poisson and binomial latent variables, each of which can depend on covariates and previous outcomes. An EM algorithm is proposed for parameter estimation, and generalized estimating equations adjust for extra dependence between outcomes within individual animals. A Cox proportional hazards model is used to describe covariate effects on the onset of tumours.  相似文献   

14.
The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code.  相似文献   

15.
The zero-inflated binomial (ZIB) regression model was proposed to account for excess zeros in binomial regression. Since then, the model has been applied in various fields, such as ecology and epidemiology. In these applications, maximum-likelihood estimation (MLE) is used to derive parameter estimates. However, theoretical properties of the MLE in ZIB regression have not yet been rigorously established. The current paper fills this gap and thus provides a rigorous basis for applying the model. Consistency and asymptotic normality of the MLE in ZIB regression are proved. A consistent estimator of the asymptotic variance–covariance matrix of the MLE is also provided. Finite-sample behavior of the estimator is assessed via simulations. Finally, an analysis of a data set in the field of health economics illustrates the paper.  相似文献   

16.
Overdispersion or extra variation is a common phenomenon that occurs when binomial (multinomial) data exhibit larger variances than that permitted by the binomial (multinomial) model. This arises when the data are clustered or when the assumption of independence is violated. Goodness-of-fit (GOF) tests available in the overdispersion literature have focused on testing for the presence of overdispersion in the data and hence they are not applicable for choosing between the several competing overdispersion models. In this paper, we consider a GOF test proposed by Neerchal and Morel [1998. Large cluster results for two parametric multinomial extra variation models. J. Amer. Statist. Assoc. 93(443), 1078–1087], and study its distributional properties and performance characteristics. This statistic is a direct analogue of the usual Pearson chi-squared statistic, but is also applicable when the clusters are not necessarily of the same size. As this test statistic is for testing model adequacy against the alternative that the model is not adequate, it is applicable in testing two competing overdispersion models.  相似文献   

17.
Biological control of pests is an important branch of entomology, providing environmentally friendly forms of crop protection. Bioassays are used to find the optimal conditions for the production of parasites and strategies for application in the field. In some of these assays, proportions are measured and, often, these data have an inflated number of zeros. In this work, six models will be applied to data sets obtained from biological control assays for Diatraea saccharalis , a common pest in sugar cane production. A natural choice for modelling proportion data is the binomial model. The second model will be an overdispersed version of the binomial model, estimated by a quasi-likelihood method. This model was initially built to model overdispersion generated by individual variability in the probability of success. When interest is only in the positive proportion data, a model can be based on the truncated binomial distribution and in its overdispersed version. The last two models include the zero proportions and are based on a finite mixture model with the binomial distribution or its overdispersed version for the positive data. Here, we will present the models, discuss their estimation and compare the results.  相似文献   

18.
Multivariate zero-inflated Poisson (ZIP) distributions are important tools for modelling and analysing correlated count data with extra zeros. Unfortunately, existing multivariate ZIP distributions consider only the overall zero-inflation while the component zero-inflation is not well addressed. This paper proposes a flexible multivariate ZIP distribution, called the multivariate component ZIP distribution, in which both the overall and component zero-inflations are taken into account. Likelihood-based inference procedures including the calculation of maximum likelihood estimates of parameters in the model without and with covariates are provided. Simulation studies indicate that the performance of the proposed methods on the multivariate component ZIP model is satisfactory. The Australia health care utilisation data set is analysed to demonstrate that the new distribution is more appropriate than the existing multivariate ZIP distributions.  相似文献   

19.
Excess zeros are encountered in many empirical count data applications. We provide a new explanation of extra zeros, related to the underlying stochastic process that generates events. The process has two rates: a lower rate until the first event and a higher one thereafter. We derive the corresponding distribution of the number of events during a fixed period and extend it to account for observed and unobserved heterogeneity. An application to the socioeconomic determinants of the individual number of doctor visits in Germany illustrates the usefulness of the new approach.  相似文献   

20.
In recent years, zero-inflated count data models, such as zero-inflated Poisson (ZIP) models, are widely used as the count data with extra zeros are very common in many practical problems. In order to model the correlated count data which are either clustered or repeated and to assess the effects of continuous covariates or of time scales in a flexible way, a class of semiparametric mixed-effects models for zero-inflated count data is considered. In this article, we propose a fully Bayesian inference for such models based on a data augmentation scheme that reflects both random effects of covariates and mixture of zero-inflated distribution. A computational efficient MCMC method which combines the Gibbs sampler and M-H algorithm is implemented to obtain the estimate of the model parameters. Finally, a simulation study and a real example are used to illustrate the proposed methodologies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号