首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
While excess zeros are often thought to cause data over-dispersion (i.e. when the variance exceeds the mean), this implication is not absolute. One should instead consider a flexible class of distributions that can address data dispersion along with excess zeros. This work develops a zero-inflated sum-of-Conway-Maxwell-Poissons (ZISCMP) regression as a flexible analysis tool to model count data that express significant data dispersion and contain excess zeros. This class of models contains several special case zero-inflated regressions, including zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-inflated binomial (ZIB), and the zero-inflated Conway-Maxwell-Poisson (ZICMP). Through simulated and real data examples, we demonstrate class flexibility and usefulness. We further utilize it to analyze shark species data from Australia's Great Barrier Reef to assess the environmental impact of human action on the number of various species of sharks.  相似文献   

2.
In modeling count data collected from manufacturing processes, economic series, disease outbreaks and ecological surveys, there are usually a relatively large or small number of zeros compared to positive counts. Such low or high frequencies of zero counts often require the use of underdispersed or overdispersed probability models for the underlying data generating mechanism. The commonly used models such as generalized or zero-inflated Poisson distributions are parametric and can usually account for only the overdispersion, but such distributions are often found to be inadequate in modeling underdispersion because of the need for awkward parameter or support restrictions. This article introduces a flexible class of semiparametric zero-altered models which account for both underdispersion and overdispersion and includes other familiar models such as those mentioned above as special cases. Consistency and asymptotic normality of the estimator of the dispersion parameter are derived under general conditions. Numerical support for the performance of the proposed method of inference is presented for the case of common discrete distributions.  相似文献   

3.
We address the issue of performing testing inference in the class of zero-inflated power series models. These models provide a straightforward way of modelling count data and have been widely used in practical situations. The likelihood ratio, Wald and score statistics provide the basis for testing the parameter of inflation of zeros in this class of models. In this paper, in addition to the well-known test statistics, we also consider the recently proposed gradient statistic. We conduct Monte Carlo simulation experiments to evaluate the finite-sample performance of these tests for testing the parameter of inflation of zeros. The numerical results show that the new gradient test we propose is more reliable in finite samples than the usual likelihood ratio, Wald and score tests. An empirical application to real data is considered for illustrative purposes.  相似文献   

4.
Zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models are recommended for handling excessive zeros in count data. For various reasons, researchers may not address zero inflation. This paper helps educate researchers on (1) the importance of accounting for zero inflation and (2) the consequences of misspecifying the statistical model. Using simulations, we found that when the zero inflation in the data was ignored, estimation was poor and statistically significant findings were missed. When overdispersion within the zero-inflated data was ignored, poor estimation and inflated Type I errors resulted. Recommendations on when to use the ZINB and ZIP models are provided. In an illustration using a two-step model selection procedure (likelihood ratio test and the Vuong test), the ZIP model was correctly identified only when the distributions had moderate means and sample sizes and did not correctly identify the ZINB model or the zero inflation in the ZIP and ZINB distributions.  相似文献   

5.
In many applications, the clustered count data often contain excess zeros and the zero-inflated generalized Poisson mixed (ZIGPM) regression model may be suitable. However, dispersion in ZIGPM is often treated as fixed unknown parameter, and this assumption may be not appropriate in some situations. In this article, a score test for homogeneity of dispersion parameter in ZIGPM regression model is developed and corresponding test statistic is obtained. Sampling distribution and power of the score test statistic are investigated through Monte Carlo simulation. Finally, results from a biological example illustrate the usefulness of the diagnostic statistic.  相似文献   

6.
Count data analysis techniques have been developed in biological and medical research areas. In particular, zero-inflated versions of parametric count distributions have been used to model excessive zeros that are often present in these assays. The most common count distributions for analyzing such data are Poisson and negative binomial. However, a Poisson distribution can only handle equidispersed data and a negative binomial distribution can only cope with overdispersion. However, a Conway–Maxwell–Poisson (CMP) distribution [4] can handle a wide range of dispersion. We show, with an illustrative data set on next-generation sequencing of maize hybrids, that both underdispersion and overdispersion can be present in genomic data. Furthermore, the maize data set consists of clustered observations and, therefore, we develop inference procedures for a zero-inflated CMP regression that incorporates a cluster-specific random effect term. Unlike the Gaussian models, the underlying likelihood is computationally challenging. We use a numerical approximation via a Gaussian quadrature to circumvent this issue. A test for checking zero-inflation has also been developed in our setting. Finite sample properties of our estimators and test have been investigated by extensive simulations. Finally, the statistical methodology has been applied to analyze the maize data mentioned before.  相似文献   

7.
In several cases, count data often have excessive number of zero outcomes. This zero-inflated phenomenon is a specific cause of overdispersion, and zero-inflated Poisson regression model (ZIP) has been proposed for accommodating zero-inflated data. However, if the data continue to suggest additional overdispersion, zero-inflated negative binomial (ZINB) and zero-inflated generalized Poisson (ZIGP) regression models have been considered as alternatives. This study proposes the score test for testing ZIP regression model against ZIGP alternatives and proves that it is equal to the score test for testing ZIP regression model against ZINB alternatives. The advantage of using the score test over other alternative tests such as likelihood ratio and Wald is that the score test can be used to determine whether a more complex model is appropriate without fitting the more complex model. Applications of the proposed score test on several datasets are also illustrated.  相似文献   

8.
In this paper we introduce a wide class of integer-valued stochastic processes that allows to take into consideration, simultaneously, relevant characteristics observed in count data namely zero inflation, overdispersion and conditional heteroscedasticity. This class includes, in particular, the compound Poisson, the zero-inflated Poisson and the zero-inflated negative binomial INGARCH models, recently proposed in literature. The main probabilistic analysis of this class of processes is here developed. Precisely, first- and second-order stationarity conditions are derived, the autocorrelation function is deduced and the strict stationarity is established in a large subclass. We also analyse in a particular model the existence of higher-order moments and deduce the explicit form for the first four cumulants, as well as its skewness and kurtosis.  相似文献   

9.
Zero-inflated data are more frequent when the data represent counts. However, there are practical situations in which continuous data contain an excess of zeros. In these cases, the zero-inflated Poisson, binomial or negative binomial models are not suitable. In order to reduce this gap, we propose the zero-spiked gamma-Weibull (ZSGW) model by mixing a distribution which is degenerate at zero with the gamma-Weibull distribution, which has positive support. The model attempts to estimate simultaneously the effects of explanatory variables on the response variable and the zero-spiked. We consider a frequentist analysis and a non-parametric bootstrap for estimating the parameters of the ZSGW regression model. We derive the appropriate matrices for assessing local influence on the model parameters. We illustrate the performance of the proposed regression model by means of a real data set (copaiba oil resin production) from a study carried out at the Department of Forest Science of the Luiz de Queiroz School of Agriculture, University of São Paulo. Based on the ZSGW regression model, we determine the explanatory variables that can influence the excess of zeros of the resin oil production and identify influential observations. We also prove empirically that the proposed regression model can be superior to the zero-adjusted inverse Gaussian regression model to fit zero-inflated positive continuous data.  相似文献   

10.
The zero-inflated regression models such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) or zero-inflated generalized Poisson (ZIGP) regression models can model the count data with excess zeros. The ZINB model can handle over-dispersed and the ZIGP model can handle the over or under-dispersed count data with excess zeros as well. Moreover, the count data may be correlated because of data collection procedure or special study design. The clustered sampling approach is one of the examples in which the correlation among subjects could be defined. In such situations, a marginal model using generalized estimating equation (GEE) approach can incorporate these correlations and lead up to the relationships at the population level. In this study, the GEE-based zero-inflated generalized Poisson regression model was proposed to fit over and under-dispersed clustered count data with excess zeros.  相似文献   

11.
In this paper, a zero-inflated power series regression model for longitudinal count data with excess zeros is presented. We demonstrate how to calculate the likelihood for such data when it is assumed that the increment in the cumulative total follows a discrete distribution with a location parameter that depends on a linear function of explanatory variables. Simulation studies indicate that this method can provide improvements in obtaining standard errors of the estimates. We also calculate the dispersion index for this model. The influence of a small perturbation of the dispersion index of the zero-inflated model on likelihood displacement is also studied. The zero-inflated negative binomial regression model is illustrated on data regarding joint damage in psoriatic arthritis.  相似文献   

12.
Count data series with extra zeros relative to a Poisson distribution are common in many biomedical applications. A score test is presented to assess whether the zero-inflation problem is significant to warrant the analysis by the more complex zero-inflated Poisson autoregression model. The score test is implemented as a computer program in the Splus platform. For illustration, the test procedure is applied to a workplace injury series where many zero counts are observed due to the heterogeneity in injury risk and the dynamic population involved.  相似文献   

13.
Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models.  相似文献   

14.
The zero-inflated Poisson (ZIP) distribution is widely used for modeling a count data set when the frequency of zeros is higher than the one expected under the Poisson distribution. There are many methods for making inferences for the inflation parameter in the ZIP models, e.g. the methods for testing Poisson (the inflation parameter is zero) versus ZIP distribution (the inflation parameter is positive). Most of these methods are based on the maximum likelihood estimators which do not have an explicit expression. However, the estimators which are obtained by the method of moments are powerful enough, easy to obtain and implement. In this paper, we propose an approach based on the method of moments for making inferences about the inflation parameter in the ZIP distribution. Our method is also compared to some recent methods via a simulation study and it is illustrated by an example.  相似文献   

15.
Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method.  相似文献   

16.
In statistical modelling, it is often of interest to evaluate non‐negative quantities that capture heterogeneity in the population such as variances, mixing proportions and dispersion parameters. In instances of covariate‐dependent heterogeneity, the implied homogeneity hypotheses are nonstandard and existing inferential techniques are not applicable. In this paper, we develop a quasi‐score test statistic to evaluate homogeneity against heterogeneity that varies with a covariate profile through a regression model. We establish the limiting null distribution of the proposed test as a functional of mixtures of chi‐square processes. The methodology does not require the full distribution of the data to be entirely specified. Instead, a general estimating function for a finite dimensional component of the model, that is, of interest is assumed but other characteristics of the population are left completely unspecified. We apply the methodology to evaluate the excess zero proportion in zero‐inflated models for count data. Our numerical simulations show that the proposed test can greatly improve efficiency over tests of homogeneity that neglect covariate information under the alternative hypothesis. An empirical application to dental caries indices demonstrates the importance and practical utility of the methodology in detecting excess zeros in the data.  相似文献   

17.
Homogeneity of dispersion parameters and zero-inflation parameters is a standard assumption in zero-inflated generalized Poisson regression (ZIGPR) models. However, this assumption may be not appropriate in some situations. This work develops a score test for varying dispersion and/or zero-inflation parameter in the ZIGPR models, and corresponding test statistics are obtained. Two numerical examples are given to illustrate our methodology, and the properties of score test statistics are investigated through Monte Carlo simulations.  相似文献   

18.
The classical Shewhart c-chart and p-chart which are constructed based on the Poisson and binomial distributions are inappropriate in monitoring zero-inflated counts. They tend to underestimate the dispersion of zero-inflated counts and subsequently lead to higher false alarm rate in detecting out-of-control signals. Another drawback of these charts is that their 3-sigma control limits, evaluated based on the asymptotic normality assumption of the attribute counts, have a systematic negative bias in their coverage probability. We recommend that the zero-inflated models which account for the excess number of zeros should first be fitted to the zero-inflated Poisson and binomial counts. The Poisson parameter λ estimated from a zero-inflated Poisson model is then used to construct a one-sided c-chart with its upper control limit constructed based on the Jeffreys prior interval that provides good coverage probability for λ. Similarly, the binomial parameter p estimated from a zero-inflated binomial model is used to construct a one-sided np-chart with its upper control limit constructed based on the Jeffreys prior interval or Blyth–Still interval of the binomial proportion p. A simple two-of-two control rule is also recommended to improve further on the performance of these two proposed charts.  相似文献   

19.
ABSTRACT

Motivated by an example in marine science, we use Fisher’s method to combine independent likelihood ratio tests (LRTs) and asymptotic independent score tests to assess the equivalence of two zero-inflated Beta populations (mixture distributions with three parameters). For each test, test statistics for the three individual parameters are combined into a single statistic to address the overall difference between the two populations. We also develop non parametric and semiparametric permutation-based tests for simultaneously comparing two or three features of unknown populations. Simulations show that the likelihood-based tests perform well for large sample sizes and that the statistics based on combining LRT statistics outperforms the ones based on combining score test statistics. The permutation-based tests have overall better performance in terms of both power and type I error rate. Our methods are easy to implement and computationally efficient, and can be expanded to more than two populations and to other multiple parameter families. The permutation tests are entirely generic and can be useful in various applications dealing with zero (or other) inflation.  相似文献   

20.
In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect to exogenous covariates. Apart from the classical Poisson, negative binomial and generalised Poisson distributions, many proposals have appeared in the statistical literature, perhaps in response to the new possibilities offered by advanced software that now enables researchers to implement numerous special functions in a relatively simple way. However, we believe that a significant research gap remains, since very little attention has been paid to the quasi-binomial distribution, which was first proposed over fifty years ago. We believe this distribution might constitute a valid alternative to existing regression models, in situations in which the variable has bounded support. Therefore, in this paper we present a zero-inflated regression model based on the quasi-binomial distribution, taking into account the moments and maximum likelihood estimators, and perform a score test to compare the zero-inflated quasi-binomial distribution with the zero-inflated binomial distribution, and the zero-inflated model with the homogeneous model (the model in which covariates are not considered). This analysis is illustrated with two data sets that are well known in the statistical literature and which contain a large number of zeros.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号