首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The Poisson distribution is a simple and popular model for count-data random variables, but it suffers from the equidispersion requirement, which is often not met in practice. While models for overdispersed counts have been discussed intensively in the literature, the opposite phenomenon, underdispersion, has received only little attention, especially in a time series context. We start with a detailed survey of distribution models allowing for underdispersion, discuss their properties and highlight possible disadvantages. After having identified two model families with attractive properties as well as only two model parameters, we combine these models with the INAR(1) model (integer-valued autoregressive), which is particularly well suited to obtain auotocorrelated counts with underdispersion. Properties of the resulting stationary INAR(1) models and approaches for parameter estimation are considered, as well as possible extensions to higher order autoregressions. Three real-data examples illustrate the application of the models in practice.  相似文献   

2.
The complex triparametric Pearson (CTP) distribution is a flexible model belonging to the Gaussian hypergeometric family that can account for over- and underdispersion. However, despite its good properties, not much attention has been paid to it. So, we revive the CTP comparing it with some well-known distributions that cope with overdispersion (negative binomial, generalized Poisson and univariate generalized Waring) as well as underdispersion (Conway–Maxwell–Poisson (CMP) and hyper-Poisson (HP)). We make a simulation study that reveals the performance of the CTP and shows that it has its own space among count data models. In this sense, we also explore some overdispersed datasets which seem to be more appropriately modelled by the CTP than by other usual models. Moreover, we include two underdispersed examples to illustrate that the CTP can provide similar fits to the CMP or HP (sometimes even more accurate) without the computational problems of these models.  相似文献   

3.
Bayesian hierarchical models typically involve specifying prior distributions for one or more variance components. This is rather removed from the observed data, so specification based on expert knowledge can be difficult. While there are suggestions for “default” priors in the literature, often a conditionally conjugate inverse‐gamma specification is used, despite documented drawbacks of this choice. The authors suggest “conservative” prior distributions for variance components, which deliberately give more weight to smaller values. These are appropriate for investigators who are skeptical about the presence of variability in the second‐stage parameters (random effects) and want to particularly guard against inferring more structure than is really present. The suggested priors readily adapt to various hierarchical modelling settings, such as fitting smooth curves, modelling spatial variation and combining data from multiple sites.  相似文献   

4.
We propose zero-inflated statistical models based on the generalized Hermite distribution for simultaneously modelling of excess zeros, over/underdispersion, and multimodality. These new models are parsimonious yet remarkably flexible allowing the covariates to be introduced directly through the mean, dispersion, and zero-inflated parameters. To accommodate the interval inequality constraint for the dispersion parameter, we present a new link function for the covariate-dependent dispersion regression model. We derive score tests for zero inflation in both covariate-free and covariate-dependent models. Both the score test and the likelihood-ratio test are conducted to examine the validity of zero inflation. The score test provides a useful tool when computing the likelihood-ratio statistic proves to be difficult. We analyse several hotel booking cancellation datasets extracted from two recently published real datasets from a resort hotel and a city hotel. These extracted cancellation datasets reveal complex features of excess zeros, over/underdispersion, and multimodality simultaneously making them difficult to analyse with existing approaches. The application of the proposed methods to the cancellation datasets illustrates the usefulness and flexibility of the models.  相似文献   

5.
In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome.  相似文献   

6.
Real count data time series often show the phenomenon of the underdispersion and overdispersion. In this paper, we develop two extensions of the first-order integer-valued autoregressive process with Poisson innovations, based on binomial thinning, for modeling integer-valued time series with equidispersion, underdispersion, and overdispersion. The main properties of the models are derived. The methods of conditional maximum likelihood, Yule–Walker, and conditional least squares are used for estimating the parameters, and their asymptotic properties are established. We also use a test based on our processes for checking if the count time series considered is overdispersed or underdispersed. The proposed models are fitted to time series of the weekly number of syphilis cases and monthly counts of family violence illustrating its capabilities in challenging the overdispersed and underdispersed count data.  相似文献   

7.
Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.  相似文献   

8.
The general mixed linear model, containing both the fixed and random effects, is considered. Using gamma priors for the variance components, the conditional posterior distributions of the fixed effects and the variance components, conditional on the random effects, are obtained. Using the normal approximation for the multiple t distribution, approximations are obtained for the posterior distributions of the variance components in infinite series form. The same approximation Is used to obtain closed expressions for the moments of the variance components. An example is considered to illustrate the procedure and a numerical study examines the closeness of the approximations.  相似文献   

9.
An extension of the generalized linear mixed model was constructed to simultaneously accommodate overdispersion and hierarchies present in longitudinal or clustered data. This so‐called combined model includes conjugate random effects at observation level for overdispersion and normal random effects at subject level to handle correlation, respectively. A variety of data types can be handled in this way, using different members of the exponential family. Both maximum likelihood and Bayesian estimation for covariate effects and variance components were proposed. The focus of this paper is the development of an estimation procedure for the two sets of random effects. These are necessary when making predictions for future responses or their associated probabilities. Such (empirical) Bayes estimates will also be helpful in model diagnosis, both when checking the fit of the model as well as when investigating outlying observations. The proposed procedure is applied to three datasets of different outcome types. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
This paper explores the utility of different approaches for modeling longitudinal count data with dropouts arising from a clinical study for the treatment of actinic keratosis lesions on the face and balding scalp. A feature of these data is that as the disease for subjects on the active arm improves their data show larger dispersion compared with those on the vehicle, exhibiting an over‐dispersion relative to the Poisson distribution. After fitting the marginal (or population averaged) model using the generalized estimating equation (GEE), we note that inferences from such a model might be biased as dropouts are treatment related. Then, we consider using a weighted GEE (WGEE) where each subject's contribution to the analysis is weighted inversely by the subject's probability of dropout. Based on the model findings, we argue that the WGEE might not address the concerns about the impact of dropouts on the efficacy findings when dropouts are treatment related. As an alternative, we consider likelihood‐based inference where random effects are added to the model to allow for heterogeneity across subjects. Finally, we consider a transition model where, unlike the previous approaches that model the log‐link function of the mean response, we model the subject's actual lesion counts. This model is an extension of the Poisson autoregressive model of order 1, where the autoregressive parameter is taken to be a function of treatment as well as other covariates to induce different dispersions and correlations for the two treatment arms. We conclude with a discussion about model selection. Published in 2009 by John Wiley & Sons, Ltd.  相似文献   

11.
In many applications of generalized linear mixed models to clustered correlated or longitudinal data, often we are interested in testing whether a random effects variance component is zero. The usual asymptotic mixture of chi‐square distributions of the score statistic for testing constrained variance components does not necessarily hold. In this article, the author proposes and explores a parametric bootstrap test that appears to be valid based on its estimated level of significance under the null hypothesis. Results from a simulation study indicate that the bootstrap test has a level much closer to the nominal one while the asymptotic test is conservative, and is more powerful than the usual asymptotic score test based on a mixture of chi‐squares. The proposed bootstrap test is illustrated using two sets of real‐life data obtained from clinical trials. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

12.
We present a bivariate regression model for count data that allows for positive as well as negative correlation of the response variables. The covariance structure is based on the Sarmanov distribution and consists of a product of generalised Poisson marginals and a factor that depends on particular functions of the response variables. The closed form of the probability function is derived by means of the moment-generating function. The model is applied to a large real dataset on health care demand. Its performance is compared with alternative models presented in the literature. We find that our model is significantly better than or at least equivalent to the benchmark models. It gives insights into influences on the variance of the response variables.  相似文献   

13.
In this paper, we consider a statistical model for the drug concentration–time profiles that are obtained in a pharmacokinetic (PK) study when the drug is orally administered. In the proposed statistical PK model, the subject-specific concentration–time curve is described by the one-compartment PK model with first-order absorption and elimination. Moreover, a multivariate generalized gamma distribution is developed for the joint distribution of the drug concentrations that are repeatedly measured from the same subject. We then construct confidence intervals for the subject–exposure parameters which provide a further insight into the individual exposure of the drug under study. The proposed statistical PK model and the associated inference are then applied to illustrate a real data set. A simulation study is also implemented to investigate the performances of the coverage probability and expected length of the proposed confidence intervals. Finally, we give conclusions and discussions on the application of the proposed procedures.  相似文献   

14.
It is well known that in a traditional outlier-free situation, the generalized quasi-likelihood (GQL) approach [B.C. Sutradhar, On exact quasilikelihood inference in generalized linear mixed models, Sankhya: Indian J. Statist. 66 (2004), pp. 261–289] performs very well to obtain the consistent as well as the efficient estimates for the parameters involved in the generalized linear mixed models (GLMMs). In this paper, we first examine the effect of the presence of one or more outliers on the GQL estimation for the parameters in such GLMMs, especially in two important models such as count and binary mixed models. The outliers appear to cause serious biases and hence inconsistency in the estimation. As a remedy, we then propose a robust GQL (RGQL) approach in order to obtain the consistent estimates for the parameters in the GLMMs in the presence of one or more outliers. An extensive simulation study is conducted to examine the consistency performance of the proposed RGQL approach.  相似文献   

15.
Many applications in public health, medical and biomedical or other studies demand modelling of two or more longitudinal outcomes jointly to get better insight into their joint evolution. In this regard, a joint model for a longitudinal continuous and a count sequence, the latter possibly overdispersed and zero-inflated (ZI), will be specified that assembles aspects coming from each one of them into one single model. Further, a subject-specific random effect is included to account for the correlation in the continuous outcome. For the count outcome, clustering and overdispersion are accommodated through two distinct sets of random effects in a generalized linear model as proposed by Molenberghs et al. [A family of generalized linear models for repeated measures with normal and conjugate random effects. Stat Sci. 2010;25:325–347]; one is normally distributed, the other conjugate to the outcome distribution. The association among the two sequences is captured by correlating the normal random effects describing the continuous and count outcome sequences, respectively. An excessive number of zero counts is often accounted for by using a so-called ZI or hurdle model. ZI models combine either a Poisson or negative-binomial model with an atom at zero as a mixture, while the hurdle model separately handles the zero observations and the positive counts. This paper proposes a general joint modelling framework in which all these features can appear together. We illustrate the proposed method with a case study and examine it further with simulations.  相似文献   

16.
In this paper, we introduce the shared gamma frailty models with two different baseline distributions namely, the generalized log-logistic and the generalized Weibull. We introduce the Bayesian estimation procedure to estimate the parameters involved in these models. We present a simulation study to compare the true values of the parameters with the estimated values. We apply these models to a real-life bivariate survival data set of McGilchrist and Aisbett related to the kidney infection data and a better model is suggested for the data.  相似文献   

17.
Extended Poisson process modelling is generalised to allow for covariate-dependent dispersion as well as a covariate-dependent mean response. This is done by a re-parameterisation that uses approximate expressions for the mean and variance. Such modelling allows under- and over-dispersion, or a combination of both, in the same data set to be accommodated within the same modelling framework. All the necessary calculations can be done numerically, enabling maximum likelihood estimation of all model parameters to be carried out. The modelling is applied to re-analyse two published data sets, where there is evidence of covariate-dependent dispersion, with the modelling leading to more informative analyses of these data and more appropriate measures of the precision of any estimates.  相似文献   

18.
We often rely on the likelihood to obtain estimates of regression parameters but it is not readily available for generalized linear mixed models (GLMMs). Inferences for the regression coefficients and the covariance parameters are key in these models. We presented alternative approaches for analyzing binary data from a hierarchical structure that do not rely on any distributional assumptions: a generalized quasi-likelihood (GQL) approach and a generalized method of moments (GMM) approach. These are alternative approaches to the typical maximum-likelihood approximation approach in Statistical Analysis System (SAS) such as Laplace approximation (LAP). We examined and compared the performance of GQL and GMM approaches with multiple random effects to the LAP approach as used in PROC GLIMMIX, SAS. The GQL approach tends to produce unbiased estimates, whereas the LAP approach can lead to highly biased estimates for certain scenarios. The GQL approach produces more accurate estimates on both the regression coefficients and the covariance parameters with smaller standard errors as compared to the GMM approach. We found that both GQL and GMM approaches are less likely to result in non-convergence as opposed to the LAP approach. A simulation study was conducted and a numerical example was presented for illustrative purposes.  相似文献   

19.
The interval-censored survival data appear very frequently, where the event of interest is not observed exactly but it is only known to occur within some time interval. In this paper, we propose a location-scale regression model based on the log-generalized gamma distribution for modelling interval-censored data. We shall be concerned only with parametric forms. The proposed model for interval-censored data represents a parametric family of models that has, as special submodels, other regression models which are broadly used in lifetime data analysis. Assuming interval-censored data, we consider a frequentist analysis, a Jackknife estimator and a non-parametric bootstrap for the model parameters. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some techniques to perform global influence.  相似文献   

20.
Several types of multivariate extensions of the inverse Gaussian (IG) distribution and the reciprocal inverse Gaussian (RIG) distribution are proposed. Some of these types are obtained as random-additive-effect models by means of well-known convolution properties of the IG and RIG distributions, and they have one-dimensional IG or RIG marginals. They are used to define a flexible class of multivariate Poisson mixtures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号