首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
胡亚南  田茂再 《统计研究》2019,36(1):104-114
零膨胀计数数据破坏了泊松分布的方差-均值关系,可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术, 研究了零膨胀计数数据的联合建模及变量选择问题.对于零膨胀泊松分布,引入潜变量,构造出零膨胀泊松模型的完全似然, 其中由零膨胀部分和泊松部分两项组成.考虑到协变量可能存在共线性和稀疏性,通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量,并用贝叶斯信息准则BIC来确定最优调节参数.本文也给出了估计量的大样本性质的理论证明和模拟研究,最后把所提出的方法应用到实际问题中。  相似文献   

2.
The generalized Poisson (GP) regression is an increasingly popular approach for modeling overdispersed as well as underdispersed count data. Several parameterizations have been performed for the GP regression, and the two well known models, the GP-1 and the GP-2, have been applied. The GP-P regression, which has been recently proposed, has the advantage of nesting the GP-1 and the GP-2 parametrically, besides allowing the statistical tests of the GP-1 and the GP-2 against a more general alternative. In several cases, count data often have excessive number of zero outcomes than are expected in the Poisson. This zero-inflation phenomenon is a specific cause of overdispersion, and the zero-inflated Poisson (ZIP) regression model has been proposed. However, if the data continue to suggest additional overdispersion, the zero-inflated negative binomial (ZINB-1 and ZINB-2) and the zero-inflated generalized Poisson (ZIGP-1 and ZIGP-2) regression models have been considered as alternatives. This article proposes a functional form of the ZIGP which mixes a distribution degenerate at zero with a GP-P distribution. The suggested model has the advantage of nesting the ZIP and the two well known ZIGP (ZIGP-1 and ZIGP-2) regression models, besides allowing the statistical tests of the ZIGP-1 and the ZIGP-2 against a more general alternative. The ZIP and the functional form of the ZIGP regression models are fitted, compared and tested on two sets of count data; the Malaysian insurance claim data and the German healthcare data.  相似文献   

3.
In this article, we take a brief overview of different functional forms of generalized Poisson distribution (GPD) and various methods of its parameter estimation found in the literature. We compare the method of moment estimation (ME) and maximum likelihood estimation (MLE) of parameters of GPD through simulation study in terms of bias, MSE and covariance. To simulate random numbers from GPD, we develop a Matlab function gpoissrnd(). The simulation study leads to the important conclusion that the ME performs better or equally good as compared to MLE when sample size is small.

Further we fit the GPD to various datasets in literature using both estimation methods and observe that the results do not differ significantly even though the sample size is large. Overall, we conclude that for GPD, use of ME in place of MLE will lead to almost similar results. The computational simplicity in calculation of ME as compared to MLE also gives support to the use of ME in case of GPD for practitioners.  相似文献   


4.
Longitudinal count data with excessive zeros frequently occur in social, biological, medical, and health research. To model such data, zero-inflated Poisson (ZIP) models are commonly used, after separating zero and positive responses. As longitudinal count responses are likely to be serially correlated, such separation may destroy the underlying serial correlation structure. To overcome this problem recently observation- and parameter-driven modelling approaches have been proposed. In the observation-driven model, the response at a specific time point is modelled through the responses at previous time points after incorporating serial correlation. One limitation of the observation-driven model is that it fails to accommodate the presence of any possible over-dispersion, which frequently occurs in the count responses. This limitation is overcome in a parameter-driven model, where the serial correlation is captured through the latent process using random effects. We compare the results obtained by the two models. A quasi-likelihood approach has been developed to estimate the model parameters. The methodology is illustrated with analysis of two real life datasets. To examine model performance the models are also compared through a simulation study.  相似文献   

5.
In this article, we compare the zero-inflated Poisson (ZIP) and negative binomial (NB) distributions based on three most important criteria: the probability of zero, the mean value, and the variance. Our results show that with same mean value and variance, the ZIP distribution always has a larger probability of zeros; with same mean value and probability of zeros, the NB distribution always has a larger variance; and with same variance and probability of zeros, the ZIP distribution always has a larger mean value. We also study the properties of Vuong test in model selection in three cases by simulations.  相似文献   

6.
For frequency counts, the situation of extra zeros often arises in biomedical applications. This is demonstrated with count data from a dental epidemiological study in Belo Horizonte (the Belo Horizonte caries prevention study) which evaluated various programmes for reducing caries. Extra zeros, however, violate the variance–mean relationship of the Poisson error structure. This extra-Poisson variation can easily be explained by a special mixture model, the zero-inflated Poisson (ZIP) model. On the basis of the ZIP model, a graphical device is presented which not only summarizes the mixing distribution but also provides visual information about the overall mean. This device can be exploited to evaluate and compare various groups. Ways are discussed to include covariates and to develop an extension of the conventional Poisson regression. Finally, a method to evaluate intervention effects on the basis of the ZIP regression model is described and applied to the data of the Belo Horizonte caries prevention study.  相似文献   

7.
The zero-inflated negative binomial (ZINB) model is used to account for commonly occurring overdispersion detected in data that are initially analyzed under the zero-inflated Poisson (ZIP) model. Tests for overdispersion (Wald test, likelihood ratio test [LRT], and score test) based on ZINB model for use in ZIP regression models have been developed. Due to similarity to the ZINB model, we consider the zero-inflated generalized Poisson (ZIGP) model as an alternate model for overdispersed zero-inflated count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes score tests for overdispersion based on the ZIGP model and illustrates that the derived score statistics are exactly the same as the score statistics under the ZINB model. A simulation study indicates the proposed score statistics are preferred to other tests for higher empirical power. In practice, based on the approximate mean–variance relationship in the data, the ZINB or ZIGP model can be considered, and a formal score test based on asymptotic standard normal distribution can be employed for assessing overdispersion in the ZIP model. We provide an example to illustrate the procedures for data analysis.  相似文献   

8.
When the manufacturing process is well monitored, occurrence of nondefects would be a frequent event in sampling inspection. The appropriate probability distribution of the number of defects is a zero-inflated Poisson (ZIP) distribution. In this article, determination of single sampling plans (SSPs) by attributes using unity values is considered, when the number of defects follows a ZIP distribution. The operating characteristic (OC) function of the sampling plan is derived. Plan parameters are obtained for some sets of values of (p1, α, p2, β). Numerical illustrations are given to describe the determination of SSP under ZIP distribution and to study its performance in comparison with Poisson SSP.  相似文献   

9.
The zero-inflated Poisson regression model is commonly used when analyzing economic data that come in the form of non-negative integers since it accounts for excess zeros and overdispersion of the dependent variable. However, a problem often encountered when analyzing economic data that has not been addressed for this model is multicollinearity. This paper proposes ridge regression (RR) estimators and some methods for estimating the ridge parameter k for a non-negative model. A simulation study has been conducted to compare the performance of the estimators. Both mean squared error and mean absolute error are considered as the performance criteria. The simulation study shows that some estimators are better than the commonly used maximum-likelihood estimator and some other RR estimators. Based on the simulation study and an empirical application, some useful estimators are recommended for practitioners.  相似文献   

10.
In several cases, count data often have excessive number of zero outcomes. This zero-inflated phenomenon is a specific cause of overdispersion, and zero-inflated Poisson regression model (ZIP) has been proposed for accommodating zero-inflated data. However, if the data continue to suggest additional overdispersion, zero-inflated negative binomial (ZINB) and zero-inflated generalized Poisson (ZIGP) regression models have been considered as alternatives. This study proposes the score test for testing ZIP regression model against ZIGP alternatives and proves that it is equal to the score test for testing ZIP regression model against ZINB alternatives. The advantage of using the score test over other alternative tests such as likelihood ratio and Wald is that the score test can be used to determine whether a more complex model is appropriate without fitting the more complex model. Applications of the proposed score test on several datasets are also illustrated.  相似文献   

11.
12.
Multivariate zero-inflated Poisson (ZIP) distributions are important tools for modelling and analysing correlated count data with extra zeros. Unfortunately, existing multivariate ZIP distributions consider only the overall zero-inflation while the component zero-inflation is not well addressed. This paper proposes a flexible multivariate ZIP distribution, called the multivariate component ZIP distribution, in which both the overall and component zero-inflations are taken into account. Likelihood-based inference procedures including the calculation of maximum likelihood estimates of parameters in the model without and with covariates are provided. Simulation studies indicate that the performance of the proposed methods on the multivariate component ZIP model is satisfactory. The Australia health care utilisation data set is analysed to demonstrate that the new distribution is more appropriate than the existing multivariate ZIP distributions.  相似文献   

13.
The zero-inflated Poisson (ZIP) distribution is widely used for modeling a count data set when the frequency of zeros is higher than the one expected under the Poisson distribution. There are many methods for making inferences for the inflation parameter in the ZIP models, e.g. the methods for testing Poisson (the inflation parameter is zero) versus ZIP distribution (the inflation parameter is positive). Most of these methods are based on the maximum likelihood estimators which do not have an explicit expression. However, the estimators which are obtained by the method of moments are powerful enough, easy to obtain and implement. In this paper, we propose an approach based on the method of moments for making inferences about the inflation parameter in the ZIP distribution. Our method is also compared to some recent methods via a simulation study and it is illustrated by an example.  相似文献   

14.
In this paper, we introduce mixed Liu estimator (MLE) for the vector of parameters in linear measurement error models by unifying the sample and the prior information. The MLE is a generalization of the mixed estimator (ME) and Liu estimator (LE). In particular, asymptotic normality properties of the estimators are discussed, and the performance of the MLE over the LE and ME are compared based on mean squared error matrix (MSEM). Finally, a Monte Carlo simulation and a numerical example are also presented for analysis.  相似文献   

15.
In this paper, we consider the maximum likelihood estimator (MLE) of the scale parameter of the generalized exponential (GE) distribution based on a random censoring model. We assume the censoring distribution also follows a GE distribution. Since the estimator does not provide an explicit solution, we propose a simple method of deriving an explicit estimator by approximating the likelihood function. In order to compare the performance of the estimators, Monte Carlo simulation is conducted. The results show that the MLE and the approximate MLE are almost identical in terms of bias and variance.  相似文献   

16.
There have been many methodologies developed about zero-inflated data in the field of statistics. However, there is little literature in the data mining fields, even though zero-inflated data could be easily found in real application fields. In fact, there is no decision tree method that is suitable for zero-inflated responses. To analyze continuous target variable with decision trees as one of data mining techniques, we use F-statistics (CHAID) or variance reduction (CART) criteria to find the best split. But these methods are only appropriate to a continuous target variable. If the target variable is rare events or zero-inflated count data, the above criteria could not give a good result because of its attributes. In this paper, we will propose a decision tree for zero-inflated count data, using a maximum of zero-inflated Poisson likelihood as the split criterion. In addition, using well-known data sets we will compare the performance of the split criteria. In the case when the analyst is interested in lower value groups (e.g. no defect areas, customers who do not claim), the suggested ZIP tree would be more efficient.  相似文献   

17.
The scaled (two-parameter) Type I generalized logistic distribution (GLD) is considered with the known shape parameter. The ML method does not yield an explicit estimator for the scale parameter even in complete samples. In this article, we therefore construct a new linear estimator for scale parameter, based on complete and doubly Type-II censored samples, by making linear approximations to the intractable terms of the likelihood equation using least-squares (LS) method, a new approach of linearization. We call this as linear approximate maximum likelihood estimator (LAMLE). We also construct LAMLE based on Taylor series method of linear approximation and found that this estimator is slightly biased than that based on the LS method. A Monte Carlo simulation is used to investigate the performance of LAMLE and found that it is almost as efficient as MLE, though biased than MLE. We also compare unbiased LAMLE with BLUE based on the exact variances of the estimators and interestingly this new unbiased LAMLE is found just as efficient as the BLUE in both complete and Type-II censored samples. Since MLE is known as asymptotically unbiased, in large samples we compare unbiased LAMLE with MLE and found that this estimator is almost as efficient as MLE. We have also discussed interval estimation of the scale parameter from complete and Type-II censored samples. Finally, we present some numerical examples to illustrate the construction of the new estimators developed here.  相似文献   

18.
This paper deals with the estimation of the parameters of a truncated gamma distribution over (0,τ), where τ is assumed to be a real number. We obtain a necessary and sufficient condition for the existence of the maximum likelihood estimator(MLE). The probability of nonexistence of MLE is observed to be positive. A simulation study indicates that the modified maximum likelihood estimator and the mixed estimator, which exist with probability one,are to be preferred over MLE. The bias, the mean square error, and the probability of nearness form a basis of our simulation study.  相似文献   

19.
Count data with excess zeros often occurs in areas such as public health, epidemiology, psychology, sociology, engineering, and agriculture. Zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression are useful for modeling such data, but because of hierarchical study design or the data collection procedure, zero-inflation and correlation may occur simultaneously. To overcome these challenges ZIP or ZINB may still be used. In this paper, multilevel ZINB regression is used to overcome these problems. The method of parameter estimation is an expectation-maximization algorithm in conjunction with the penalized likelihood and restricted maximum likelihood estimates for variance components. Alternative modeling strategies, namely the ZIP distribution are also considered. An application of the proposed model is shown on decayed, missing, and filled teeth of children aged 12 years old.  相似文献   

20.
This paper proposes a Poisson‐based model that uses both error‐free data and error‐prone data subject to misclassification in the form of false‐negative and false‐positive counts. It derives maximum likelihood estimators (MLEs) for the Poisson rate parameter and the two misclassification parameters — the false‐negative parameter and the false‐positive parameter. It also derives expressions for the information matrix and the asymptotic variances of the MLE for the rate parameter, the MLE for the false‐positive parameter, and the MLE for the false‐negative parameter. Using these expressions the paper analyses the value of the fallible data. It studies characteristics of the new double‐sampling rate estimator via a simulation experiment and applies the new MLE estimators and confidence intervals to a real dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号