期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros

Abbas Moghimbeigi Kazem Mohammad Brian Mcardle 《Journal of applied statistics》2008,35(10):1193-1202

Count data with excess zeros often occurs in areas such as public health, epidemiology, psychology, sociology, engineering, and agriculture. Zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression are useful for modeling such data, but because of hierarchical study design or the data collection procedure, zero-inflation and correlation may occur simultaneously. To overcome these challenges ZIP or ZINB may still be used. In this paper, multilevel ZINB regression is used to overcome these problems. The method of parameter estimation is an expectation-maximization algorithm in conjunction with the penalized likelihood and restricted maximum likelihood estimates for variance components. Alternative modeling strategies, namely the ZIP distribution are also considered. An application of the proposed model is shown on decayed, missing, and filled teeth of children aged 12 years old. 相似文献

2.

The analysis of incontinence episodes and other count data in patients with overactive bladder by Poisson and negative binomial regression

下载免费PDF全文

R. Martina R. Kay R. van Maanen A. Ridder 《Pharmaceutical statistics》2015,14(2):151-160

Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non‐parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

3.

Random effects regression models for count data with excess zeros in caries research

D. Todem Y. Zhang A. Ismail W. Sohn 《Journal of applied statistics》2010,37(10):1661-1679

We extend the family of Poisson and negative binomial models to derive the joint distribution of clustered count outcomes with extra zeros. Two random effects models are formulated. The first model assumes a shared random effects term between the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxes the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two different but correlated random effects variables. Under the conditional independence and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm are proposed to fit the proposed models. Our proposed models are fitted to dental caries counts of children under the age of six in the city of Detroit. 相似文献

4.

Marginal zero-inflated regression models for count data

Jacob Martin Daniel B. Hall 《Journal of applied statistics》2017,44(10):1807-1826

Data sets with excess zeroes are frequently analyzed in many disciplines. A common framework used to analyze such data is the zero-inflated (ZI) regression model. It mixes a degenerate distribution with point mass at zero with a non-degenerate distribution. The estimates from ZI models quantify the effects of covariates on the means of latent random variables, which are often not the quantities of primary interest. Recently, marginal zero-inflated Poisson (MZIP; Long et al. [A marginalized zero-inflated Poisson regression model with overall exposure effects. Stat. Med. 33 (2014), pp. 5151–5165]) and negative binomial (MZINB; Preisser et al., 2016) models have been introduced that model the mean response directly. These models yield covariate effects that have simple interpretations that are, for many applications, more appealing than those available from ZI regression. This paper outlines a general framework for marginal zero-inflated models where the latent distribution is a member of the exponential dispersion family, focusing on common distributions for count data. In particular, our discussion includes the marginal zero-inflated binomial (MZIB) model, which has not been discussed previously. The details of maximum likelihood estimation via the EM algorithm are presented and the properties of the estimators as well as Wald and likelihood ratio-based inference are examined via simulation. Two examples presented illustrate the advantages of MZIP, MZINB, and MZIB models for practical data analysis. 相似文献

5.

Efficient regression modeling for correlated and overdispersed count data

《统计学通讯:理论与方法》2012,41(24):6005-6018

Abstract

The objective of this paper is to propose an efficient estimation procedure in a marginal mean regression model for longitudinal count data and to develop a hypothesis test for detecting the presence of overdispersion. We extend the matrix expansion idea of quadratic inference functions to the negative binomial regression framework that entails accommodating both the within-subject correlation and overdispersion issue. Theoretical and numerical results show that the proposed procedure yields a more efficient estimator asymptotically than the one ignoring either the within-subject correlation or overdispersion. When the overdispersion is absent in data, the proposed method might hinder the estimation efficiency in practice, yet the Poisson regression based regression model is fitted to the data sufficiently well. Therefore, we construct the hypothesis test that recommends an appropriate model for the analysis of the correlated count data. Extensive simulation studies indicate that the proposed test can identify the effective model consistently. The proposed procedure is also applied to a transportation safety study and recommends the proposed negative binomial regression model. 相似文献

6.

A test for lack-of-fit of zero-inflated negative binomial models

Chin-Shang Li Shen-Ming Lee Ming-Shan Yeh 《Journal of Statistical Computation and Simulation》2019,89(7):1301-1321

When a count data set has excessive zero counts, nonzero counts are overdispersed, and the effect of a continuous covariate might be nonlinear, for analysis a semiparametric zero-inflated negative binomial (ZINB) regression model is proposed. The unspecified smooth functional form for the continuous covariate effect is approximated by a cubic spline. The semiparametric ZINB regression model is fitted by maximizing the likelihood function. The likelihood ratio procedure is used to evaluate the adequacy of a postulated parametric functional form for the continuous covariate effect. An extensive simulation study is conducted to assess the finite-sample performance of the proposed test. The practicality of the proposed methodology is demonstrated with data of a motorcycle survey of traffic regulations conducted in 2007 in Taiwan by the Ministry of Transportation and Communication. 相似文献

7.

Multivariate models for correlated count data

Mariana Rodrigues-Motta Hildete P. Pinheiro Eduardo G. Martins Márcio S. Araújo Sérgio F. dos Reis 《Journal of applied statistics》2013,40(7):1586-1596

In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome. 相似文献

8.

A comparison of count data models with an application to daily cigarette consumption of young persons

Muhammed Fatih Tüzen Semra Erbaş 《统计学通讯:理论与方法》2018,47(23):5825-5844

The objective of this study is providing a comparative assessment for researchers to deal with the challenges of analyzing count data and examining the factors associated with daily cigarette consumption among the young people in Turkey. We fitted Poisson (P), negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH) and negative binomial hurdle (NBH) regressions to cigarette consumption count data by using the 2014 Turkey Health Survey. Our results showed that the ZINB and NBH models should be preferred. We also found that, gender, employment and tobacco use at home are more effective factors for smokers and nonsmokers in the 15–24 age group in Turkey. 相似文献

9.

Variable selection approach for zero-inflated count data via adaptive lasso

Ping Zeng Yongyue Wei Yang Zhao Jin Liu Liya Liu Ruyang Zhang 《Journal of applied statistics》2014,41(4):879-894

This article proposes a variable selection approach for zero-inflated count data analysis based on the adaptive lasso technique. Two models including the zero-inflated Poisson and the zero-inflated negative binomial are investigated. An efficient algorithm is used to minimize the penalized log-likelihood function in an approximate manner. Both the generalized cross-validation and Bayesian information criterion procedures are employed to determine the optimal tuning parameter, and a consistent sandwich formula of standard errors for nonzero estimates is given based on local quadratic approximation. We evaluate the performance of the proposed adaptive lasso approach through extensive simulation studies, and apply it to analyze real-life data about doctor visits. 相似文献

10.

Dealing with excess of zeros in the statistical analysis of magnetic resonance imaging lesion count in multiple sclerosis

M Francois C Peter F Gordon 《Pharmaceutical statistics》2012,11(5):417-424

Lesion count observed on brain magnetic resonance imaging scan is a common end point in phase 2 clinical trials evaluating therapeutic treatment in relapsing remitting multiple sclerosis (MS). This paper compares the performances of Poisson, zero‐inflated poisson (ZIP), negative binomial (NB), and zero‐inflated NB (ZINB) mixed‐effects regression models in fitting lesion count data in a clinical trial evaluating the efficacy and safety of fingolimod in comparison with placebo, in MS. The NB and ZINB models prove to be superior to the Poisson and ZIP models. We discuss the advantages and limitations of zero‐inflated models in the context of MS treatment. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

11.

An empirical approach to determine a threshold for assessing overdispersion in Poisson and negative binomial models for count data

Elizabeth H. Payne Mulugeta Gebregziabher James W. Hardin Viswanathan Ramakrishnan Leonard E. Egede 《统计学通讯:模拟与计算》2018,47(6):1722-1738

Overdispersion is a problem encountered in the analysis of count data that can lead to invalid inference if unaddressed. Decision about whether data are overdispersed is often reached by checking whether the ratio of the Pearson chi-square statistic to its degrees of freedom is greater than one; however, there is currently no fixed threshold for declaring the need for statistical intervention. We consider simulated cross-sectional and longitudinal datasets containing varying magnitudes of overdispersion caused by outliers or zero inflation, as well as real datasets, to determine an appropriate threshold value of this statistic which indicates when overdispersion should be addressed. 相似文献

12.

A Bayesian approach of joint models for clustered zero-inflated count data with skewness and measurement errors

Ying-zi Fu Pei-xiao Chu Li-ying Lu 《Journal of applied statistics》2015,42(4):745-761

Count data with excess zeros are widely encountered in the fields of biomedical, medical, public health and social survey, etc. Zero-inflated Poisson (ZIP) regression models with mixed effects are useful tools for analyzing such data, in which covariates are usually incorporated in the model to explain inter-subject variation and normal distribution is assumed for both random effects and random errors. However, in many practical applications, such assumptions may be violated as the data often exhibit skewness and some covariates may be measured with measurement errors. In this paper, we deal with these issues simultaneously by developing a Bayesian joint hierarchical modeling approach. Specifically, by treating intercepts and slopes in logistic and Poisson regression as random, a flexible two-level ZIP regression model is proposed, where a covariate process with measurement errors is established and a skew-t-distribution is considered for both random errors and random effects. Under the Bayesian framework, model selection is carried out using deviance information criterion (DIC) and a goodness-of-fit statistics is also developed for assessing the plausibility of the posited model. The main advantage of our method is that it allows for more robustness and correctness for investigating heterogeneity from different levels, while accommodating the skewness and measurement errors simultaneously. An application to Shanghai Youth Fitness Survey is used as an illustrate example. Through this real example, it is showed that our approach is of interest and usefulness for applications. 相似文献

13.

A bivariate Sarmanov regression model for count data with generalised Poisson marginals

Vera Hofer Johannes Leitner 《Journal of applied statistics》2012,39(12):2599-2617

We present a bivariate regression model for count data that allows for positive as well as negative correlation of the response variables. The covariance structure is based on the Sarmanov distribution and consists of a product of generalised Poisson marginals and a factor that depends on particular functions of the response variables. The closed form of the probability function is derived by means of the moment-generating function. The model is applied to a large real dataset on health care demand. Its performance is compared with alternative models presented in the literature. We find that our model is significantly better than or at least equivalent to the benchmark models. It gives insights into influences on the variance of the response variables. 相似文献

14.

Improved two-parameter estimators for the negative binomial and Poisson regression models

Merve Kandemir Çetinkaya Selahattin Kaçıranlar 《Journal of Statistical Computation and Simulation》2019,89(14):2645-2660

Negative binomial regression (NBR) and Poisson regression (PR) applications have become very popular in the analysis of count data in recent years. However, if there is a high degree of relationship between the independent variables, the problem of multicollinearity arises in these models. We introduce new two-parameter estimators (TPEs) for the NBR and the PR models by unifying the two-parameter estimator (TPE) of Özkale and Kaç?ranlar [The restricted and unrestricted two-parameter estimators. Commun Stat Theory Methods. 2007;36:2707–2725]. These new estimators are general estimators which include maximum likelihood (ML) estimator, ridge estimator (RE), Liu estimator (LE) and contraction estimator (CE) as special cases. Furthermore, biasing parameters of these estimators are given and a Monte Carlo simulation is done to evaluate the performance of these estimators using mean square error (MSE) criterion. The benefits of the new TPEs are also illustrated in an empirical application. The results show that the new proposed TPEs for the NBR and the PR models are better than the ML estimator, the RE and the LE. 相似文献

15.

Things that make us different: analysis of deviance with time-use data

Jorge González Chapela 《Journal of applied statistics》2013,40(7):1572-1585

The constrained, non-normal nature of time-use data poses a challenge to ordinary analysis of variance. This paper investigates a computationally simple variance decomposition technique suitable for those data. As a by-product of the analysis, a measure of fit for systems of time-demand equations is proposed that possesses several useful properties. 相似文献

16.

Some issues in logistic regression

Thomas P. Ryan 《统计学通讯:理论与方法》2013,42(9-10):2019-2032

Much research has been performed in the area of multiple linear regression, with the resuit that the field is well-developed. This is not true of logistic regression, however. The latter presents special problems because the response is not continuous. Some of these problems are: the difficulty of developing a suitable R² statistic, possibly poor results produced by the method of maximum likelihood, and the challenge to develop suitable graphical techniques. We describe recent work in some of these directions, and discuss the need for additional research. 相似文献

17.

Quantifying R 2 bias in the presence of measurement error

Karl D. Majeske Terri Lynch-Caris Janet Brelin-Fornari 《Journal of applied statistics》2010,37(4):667-677

相似文献

18.

A nonparametric R test for the presence of relevant variables

Feng Yao Aman Ullah 《Journal of statistical planning and inference》2013

相似文献

19.

A Coefficient of Determination for Generalized Linear Models

Dabao Zhang 《The American statistician》2017,71(4):310-316

The coefficient of determination, a.k.a. R², is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R² only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered. 相似文献

20.

On the behaviour of some transforms of the sample correlation coefficient:discretebivariate populations

Subrahmaniam Kocherlakota M. Singh 《统计学通讯:理论与方法》2013,42(18):2017-2043

The present paper investigates for the first time, the robustness of some of the familiar transformations of the sample correlation coefficient when the parent population is discrete. Three specific cases examined are:The bivariate Poisson (BVP):the bivariate negative binomial (BNB):The trinomial (TN). Investigation of the (near) normality of the transformed statistics is done by the techniques considered by Subrahmaniam and Gajjar. In addition, an empirical examination of their behaviour is carried out by the density estimation technique due to Tarter and Kronmal. 相似文献