首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The negative binomial (NB) model and the generalized Poisson (GP) model are common alternatives to Poisson models when overdispersion is present in the data. Having accounted for initial overdispersion, we may require further investigation as to whether there is evidence for zero-inflation in the data. Two score statistics are derived from the GP model for testing zero-inflation. These statistics, unlike Wald-type test statistics, do not require that we fit the more complex zero-inflated overdispersed models to evaluate zero-inflation. A simulation study illustrates that the developed score statistics reasonably follow a χ2 distribution and maintain the nominal level. Extensive simulation results also indicate the power behavior is different for including a continuous variable than a binary variable in the zero-inflation (ZI) part of the model. These differences are the basis from which suggestions are provided for real data analysis. Two practical examples are presented in this article. Results from these examples along with practical experience lead us to suggest performing the developed score test before fitting a zero-inflated NB model to the data.  相似文献   

2.
Count data consists of discrete non-negative integer values. Poisson regression model is one of the most popular model used to model count data. This model assumes that response variable has Poisson distribution. The purpose of this article is to assess distributional assumption of this model by using some goodness of fit tests. These tests are compared in respect to type I error and power rates of tests with different samples, parameters and sample sizes. Simulation study suggests that the most powerful tests are generally Dean–Lawless and Cameron–Trivedi score tests.  相似文献   

3.
Understanding how wood develops has become an important problematic of plant sciences. However, studying wood formation requires the acquisition of count data difficult to interpret. Here, the annual wood formation dynamics of a conifer tree species were modeled using generalized linear and additive models (GLM and GAM); GAM for location, scale, and shape (GAMLSS); a discrete semiparametric kernel regression for count data. The performance of models is evaluated using bootstrap methods. GLM was useful to describe the wood formation general pattern but had a lack of fitting, while GAM, GAMLSS, and kernel regression had a higher sensibility to short-term variations.  相似文献   

4.
A method for robustness in linear models is to assume that there is a mixture of standard and outlier observations with a different error variance for each class. For generalised linear models (GLMs) the mixture model approach is more difficult as the error variance for many distributions has a fixed relationship to the mean. This model is extended to GLMs by changing the classes to one where the standard class is a standard GLM and the outlier class which is an overdispersed GLM achieved by including a random effect term in the linear predictor. The advantages of this method are it can be extended to any model with a linear predictor, and outlier observations can be easily identified. Using simulation the model is compared to an M-estimator, and found to have improved bias and coverage. The method is demonstrated on three examples.  相似文献   

5.
We describe a mixed-effect hurdle model for zero-inflated longitudinal count data, where a baseline variable is included in the model specification. Association between the count data process and the endogenous baseline variable is modeled through a latent structure, assumed to be dependent across equations. We show how model parameters can be estimated in a finite mixture context, allowing for overdispersion, multivariate association and endogeneity of the baseline variable. The model behavior is investigated through a large-scale simulation experiment. An empirical example on health care utilization data is provided.  相似文献   

6.
Abstract

Both Poisson and negative binomial regression can provide quasi-likelihood estimates for coefficients in exponential-mean models that are consistent in the presence of distributional misspecification. It has generally been recommended, however, that inference be carried out using asymptotically robust estimators for the parameter covariance matrix. As with linear models, such robust inference tends to lead to over-rejection of null hypotheses in small samples. Alternative methods for estimating coefficient estimator variances are considered. No one approach seems to remove all test bias, but the results do suggest that the use of the jackknife with Poisson regression tends to be least biased for inference.  相似文献   

7.
In this article, we investigate the efficiency of score tests for testing a censored Poisson regression model against censored negative binomial regression alternatives. Based on the results of a simulation study, score tests using the normal approximation, underestimate the nominal significance level. To remedy this problem, bootstrap methods are proposed. We find that bootstrap methods keep the significance level close to the nominal one and have greater power uniformly than does the normal approximation for testing the hypothesis.  相似文献   

8.
ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

9.
We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion.  相似文献   

10.
Consider data (x 1,y 1),...,(x n,y n), where each x i may be vector valued, and the distribution of y i given x i is a mixture of linear regressions. This provides a generalization of mixture models which do not include covariates in the mixture formulation. This mixture of linear regressions formulation has appeared in the computer science literature under the name Hierarchical Mixtures of Experts model.This model has been considered from both frequentist and Bayesian viewpoints. We focus on the Bayesian formulation. Previously, estimation of the mixture of linear regression model has been done through straightforward Gibbs sampling with latent variables. This paper contributes to this field in three major areas. First, we provide a theoretical underpinning to the Bayesian implementation by demonstrating consistency of the posterior distribution. This demonstration is done by extending results in Barron, Schervish and Wasserman (Annals of Statistics 27: 536–561, 1999) on bracketing entropy to the regression setting. Second, we demonstrate through examples that straightforward Gibbs sampling may fail to effectively explore the posterior distribution and provide alternative algorithms that are more accurate. Third, we demonstrate the usefulness of the mixture of linear regressions framework in Bayesian robust regression. The methods described in the paper are applied to two examples.  相似文献   

11.
Abstract

In this article we consider the problem of fitting a five-parameter generalization of the lambda distribution to data given in the form of a grouped frequency table. The estimation of parameters is done by six different procedures: percentiles, moments, probability-weighted moments, minimum Cramér-Von Mises, maximum likelihood, and pseudo least squares. These methods are evaluated and compared using a Monte Carlo study where the parent populations were generalized lambda distribution (GLD) approximations of Normal, Beta, Gamma random variables, and for nine combinations of sample sizes and number of classes. Of the estimators analyzed it is concluded that, although the method of pseudo least squares suffers from a number of limitations, it appears to be the candidate procedure to estimate the parameters of a GLD from grouped data.  相似文献   

12.
Recently, various studies have used the Poisson Pseudo-Maximal Likehood (PML) to estimate gravity specifications of trade flows and non-count data models more generally. Some papers also report results based on the Negative Binomial Quasi-Generalised Pseudo-Maximum Likelihood (NB QGPML) estimator, which encompasses the Poisson assumption as a special case. This note shows that the NB QGPML estimators that have been used so far are unappealing when applied to a continuous dependent variable which unit choice is arbitrary, because estimates artificially depend on that choice. A new NB QGPML estimator is introduced to overcome this shortcoming.  相似文献   

13.
Analysis of the human sex ratio by using overdispersion models   总被引:2,自引:1,他引:1  
For study of the human sex ratio, one of the most important data sets was collected in Saxony in the 19th century by Geissler. The data contain the sizes of families, with the sex of all children, at the time of registration of the birth of a child. These data are reanalysed to determine how the probability for each sex changes with family size. Three models for overdispersion are fitted: the beta–binomial model of Skellam, the 'multiplicative' binomial model of Altham and the double-binomial model of Efron. For each distribution, both the probability and the dispersion parameters are allowed to vary simultaneously with family size according to two separate regression equations. A finite mixture model is also fitted. The models are fitted using non-linear Poisson regression. They are compared using direct likelihood methods based on the Akaike information criterion. The multiplicative and beta–binomial models provide similar fits, substantially better than that of the double-binomial model. All models show that both the probability that the child is a boy and the dispersion are greater in larger families. There is also some indication that a point probability mass is needed for families containing children uniquely of one sex.  相似文献   

14.
Modelling Correlated Zero-inflated Count Data   总被引:2,自引:0,他引:2  
This paper extends the two-component approach to modelling count data with extra zeros, considered by Mullahy (1986), Heilbron (1994) and Welsh et al. (1996), to take account of possible serial dependence between repeated observations. Generalized estimating equations (Liang & Zeger, 1986) are constructed for each component of the model by incorporating correlation matrices into each of the maximum likelihood estimating equations. The proposed method is demonstrated on weekly counts of Noisy Friarbirds ( Philemon cornic-ulatus ), which were recorded by observers for the Canberra Garden Bird Survey (Hermes, 1981).  相似文献   

15.
In count data models, overdispersion of the dependent variable can be incorporated into the model if a heterogeneity term is added into the mean parameter of the Poisson distribution. We use a nonparametric estimation for the heterogeneity density based on a squared Kth-order polynomial expansion, that we generalize for panel data. A numerical illustration using an insurance dataset is discussed. Even if some statistical analyses showed no clear differences between these new models and the standard Poisson with gamma random effects, we show that the choice of the random effects distribution has a significant influence for interpreting our results.  相似文献   

16.
Zero-inflated Poisson mixed regression models are popular approaches to analyze clustered count data with excess zeros. Prior to application of these models, it is essential to examine the necessity of the adjustment for zero outcomes. The existing literature, however, has focused only on score tests for testing the suitability of zero-inflated models for correlated count data. In view of the observed bias and non-optimal size of score tests, it deserves further investigation of other alternative ways for the test. This article aims to explore the use of the null Wald and likelihood ratio tests for zero-inflation in correlated count data. Our simulation study shows that both the null Wald and likelihood ratio tests outperform the score test of Xiang et al. (2006 Xiang , L. , Lee , A. H. , Yau , K. K. W. , McLachlan , G. J. ( 2006 ). A score test for zero-inflation in correlated count data . Statistics in Medicine 25 : 16601671 . [Google Scholar]) in terms of statistical power, regardless of the computational convenience of the score test. A bootstrap null Wald statistic is also proposed, which results in improved performance in terms of the size and power of the test.  相似文献   

17.
The zero-inflated negative binomial (ZINB) model is used to account for commonly occurring overdispersion detected in data that are initially analyzed under the zero-inflated Poisson (ZIP) model. Tests for overdispersion (Wald test, likelihood ratio test [LRT], and score test) based on ZINB model for use in ZIP regression models have been developed. Due to similarity to the ZINB model, we consider the zero-inflated generalized Poisson (ZIGP) model as an alternate model for overdispersed zero-inflated count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes score tests for overdispersion based on the ZIGP model and illustrates that the derived score statistics are exactly the same as the score statistics under the ZINB model. A simulation study indicates the proposed score statistics are preferred to other tests for higher empirical power. In practice, based on the approximate mean–variance relationship in the data, the ZINB or ZIGP model can be considered, and a formal score test based on asymptotic standard normal distribution can be employed for assessing overdispersion in the ZIP model. We provide an example to illustrate the procedures for data analysis.  相似文献   

18.
Goodness-of-fit tests based on the Cramér-von Mises statistics are given for the Poisson distribution. Power comparisons show that these statistics, particularly A2, give good overall tests of fit. The statistic A2 will be particularly useful for detecting distributions where the variance is close to the mean, but which are not Poisson.  相似文献   

19.
We present a simple framework for studying empirical-distribution-function goodness-of-fit tests for discrete models. A key tool is a weak-convergence result for an estimated discrete empirical process, regarded as a random element in some suitable sequence space. Special emphasis is given to the problem of testing for a Poisson model and for the geometric distribution. Simulations show that parametric bootstrap versions of the tests maintain a nominal level of significance very closely even for small samples where reliance upon asymptotic critical values is doubtful.  相似文献   

20.
Mansson and Shukur (2011 Mansson, K., Shukur, G. (2011). A Poisson ridge regression estimator. Economic Modelling 28:14751481. [Google Scholar]) investigated the performance of the Poisson ridge regression (PRR) estimator in terms of the mean square error (MSE) criterion. Similarly, Mansson (2012 Mansson, K. (2012). On ridge estimators for the negative binomial regression model. Economic Modelling 29:178184. [Google Scholar]) investigated the performance of the Negative binomial ridge regression (NBRR) according to the MSE criterion. But there is no any analysis of the predictive performance of the PRR and NBRR estimators. Therefore, we define the PRR and the NBRR predictors to evaluate their predictive performances according to the prediction mean squared error under the target function. The Monte Carlo simulations and the real life numerical example are conducted to investigate the defined predictors' performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号