期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modeling the correlation structure of data that have multiple levels of association

Justine Shults 《统计学通讯:理论与方法》2013,42(5-6):1005-1015

Some modem approaches for the analysis of non-normally distributed and correlated data, including Liang and Zeger's ( 1986 ) method of generalized estimating equations (GEE), model the pattern of association among outcomes by assuming a structure for their correlation matrix. A number of relatively simple patterned correlation matrices are available for measurements with one level of correlation. However, modeling the correlation structure of data with multiple levels, or causes, of association is not as straightforward; this note discusses some of the difficulties and discusses a simple class of correlation models that may prove useful in this endeavor. 相似文献

2.

A PRESS statistic for working correlation structure selection in generalized estimating equations

Gul Inan Mahbub A. H. M. Latif John Preisser 《Journal of applied statistics》2019,46(4):621-637

Generalized estimating equations (GEE) is one of the most commonly used methods for regression analysis of longitudinal data, especially with discrete outcomes. The GEE method accounts for the association among the responses of a subject through a working correlation matrix and its correct specification ensures efficient estimation of the regression parameters in the marginal mean regression model. This study proposes a predicted residual sum of squares (PRESS) statistic as a working correlation selection criterion in GEE. A simulation study is designed to assess the performance of the proposed GEE PRESS criterion and to compare its performance with its counterpart criteria in the literature. The results show that the GEE PRESS criterion has better performance than the weighted error sum of squares SC criterion in all cases but is surpassed in performance by the Gaussian pseudo-likelihood criterion. Lastly, the working correlation selection criteria are illustrated with data from the Coronary Artery Risk Development in Young Adults study. 相似文献

3.

A new look at the difference between the GEE and the GLMM when modeling longitudinal count responses

H. Zhang Q. Yu C. Feng D. Gunzler P. Wu X. M. Tu 《Journal of applied statistics》2012,39(9):2067-2079

Poisson log-linear regression is a popular model for count responses. We examine two popular extensions of this model – the generalized estimating equations (GEE) and the generalized linear mixed-effects model (GLMM) – to longitudinal data analysis and complement the existing literature on characterizing the relationship between the two dueling paradigms in this setting. Unlike linear regression, the GEE and the GLMM carry significant conceptual and practical implications when applied to modeling count data. Our findings shed additional light on the differences between the two classes of models when used for count data. Our considerations are demonstrated by both real study and simulated data. 相似文献

4.

The pseudo‐GEE approach to the analysis of longitudinal surveys

Iván A. Carrillo Jiahua Chen Changbao Wu 《Revue canadienne de statistique》2010,38(4):540-554

Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. Longitudinal data are often analyzed through the generalized estimating equations (GEE) approach. The vast majority of existing literature on the GEE method; however, is developed under non‐survey settings and are inappropriate for data collected through complex sampling designs. In this paper the authors develop a pseudo‐GEE approach for the analysis of survey data. They show that survey weights must and can be appropriately accounted in the GEE method under a joint randomization framework. The consistency of the resulting pseudo‐GEE estimators is established under the proposed framework. Linearization variance estimators are developed for the pseudo‐GEE estimators when the finite population sampling fractions are small or negligible, a scenario often held for large‐scale surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study using data from the National Longitudinal Survey of Children and Youth. The results show that the pseudo‐GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous and binary responses. The Canadian Journal of Statistics 38: 540–554; 2010 © 2010 Statistical Society of Canada 相似文献

5.

The impact of dichotomization in longitudinal data analysis: a simulation study

Bongin Yoo 《Pharmaceutical statistics》2010,9(4):298-312

In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

6.

Modeling longitudinal binomial responses: implications from two dueling paradigms

H. Zhang Y. Xia R. Chen D. Gunzler W. Tang Xin Tu 《Journal of applied statistics》2011,38(11):2373-2390

The generalized estimating equations (GEEs) and generalized linear mixed-effects model (GLMM) are the two most popular paradigms to extend models for cross-sectional data to a longitudinal setting. Although the two approaches yield well-interpreted models for continuous outcomes, it is quite a different story when applied to binomial responses. We discuss major modeling differences between the GEE- and GLMM-derived models by presenting new results regarding the model-driven differences. Our results show that GLMM induces some artifacts in the marginal models at assessment times, making it inappropriate when applied to such responses from real study data. The different interpretations of parameters resulting from the conceptual difference between the two modeling approaches also carry quite significant implications and ramifications with respect to data and power analyses. Although a special case involving a scale difference in parameters between GEE and GLMM has been noted in the literature, its implications in real data analysis has not been thoroughly addressed. Further, this special case has a very limited covariate structure and does not apply to most real studies, especially multi-center clinical trials. The new results presented fill a substantial gap in the literature regarding the model-driven differences between the two dueling paradigms. 相似文献

7.

Hon Yiu So Mary E. Thompson Changbao Wu 《Revue canadienne de statistique》2020,48(4):633-654

Misclassifications in binary responses have long been a common problem in medical and health surveys. One way to handle misclassifications in clustered or longitudinal data is to incorporate the misclassification model through the generalized estimating equation (GEE) approach. However, existing methods are developed under a non-survey setting and cannot be used directly for complex survey data. We propose a pseudo-GEE method for the analysis of binary survey responses with misclassifications. We focus on cluster sampling and develop analysis strategies for analyzing binary survey responses with different forms of additional information for the misclassification process. The proposed methodology has several attractive features, including simultaneous inferences for both the response model and the association parameters. Finite sample performance of the proposed estimators is evaluated through simulation studies and an application using a real dataset from the Canadian Longitudinal Study on Aging. 相似文献

8.

Small sample characteristics of generalized estimating equations

J. C. Gunsolley C. Getchell V. M. Chinchilli 《统计学通讯:模拟与计算》2013,42(4):869-878

The aim of this study was to investigate the Type I error rate of hypothesis testing based on generalized estimating equations (GEE) for data characteristic of periodontal clinical trials. The data in these studies consist of a large number of binary responses from each subject and a small number of subjects (Haffajee et al. (1983), Goodson (1986), Jenkins et al. (1988)) Computer simulations were employed to investigate GEE based both on an empirical estimate of the variance-covariance matrix and a model-based estimate. Results from this investigation indicate that hypothesis testing based on GEE resulted in inappropriate Type I error rates when small samples are employed. Only an increase in the number of subjects to the point where it matched the number of observations per subject resulted in appropriate Type I error rates 相似文献

9.

Variance function in regression analysis of longitudinal data using the generalized estimating equation approach

《Journal of Statistical Computation and Simulation》2012,82(12):2700-2709

Longitudinal or clustered response data arise in many applications such as biostatistics, epidemiology and environmental studies. The repeated responses cannot in general be assumed to be independent. One method of analysing such data is by using the generalized estimating equations (GEE) approach. The current GEE method for estimating regression effects in longitudinal data focuses on the modelling of the working correlation matrix assuming a known variance function. However, correct choice of the correlation structure may not necessarily improve estimation efficiency for the regression parameters if the variance function is misspecified [Wang YG, Lin X. Effects of variance-function misspecification in analysis of longitudinal data. Biometrics. 2005;61:413–421]. In this connection two problems arise: finding a correct variance function and estimating the parameters of the chosen variance function. In this paper, we study the problem of estimating the parameters of the variance function assuming that the form of the variance function is known and then the effect of a misspecified variance function on the estimates of the regression parameters. We propose a GEE approach to estimate the parameters of the variance function. This estimation approach borrows the idea of Davidian and Carroll [Variance function estimation. J Amer Statist Assoc. 1987;82:1079–1091] by solving a nonlinear regression problem where residuals are regarded as the responses and the variance function is regarded as the regression function. A limited simulation study shows that the proposed method performs at least as well as the modified pseudo-likelihood approach developed by Wang and Zhao [A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics. 2007;63:681–689]. Both these methods perform better than the GEE approach. 相似文献

10.

Modelling a non-stationary BINAR(1) Poisson process

《Journal of Statistical Computation and Simulation》2012,82(15):3106-3126

ABSTRACT

Non-stationarity in bivariate time series of counts may be induced by a number of time-varying covariates affecting the bivariate responses due to which the innovation terms of the individual series as well as the bivariate dependence structure becomes non-stationary. So far, in the existing models, the innovation terms of individual INAR(1) series and the dependence structure are assumed to be constant even though the individual time series are non-stationary. Under this assumption, the reliability of the regression and correlation estimates is questionable. Besides, the existing estimation methodologies such as the conditional maximum likelihood (CMLE) and the composite likelihood estimation are computationally intensive. To address these issues, this paper proposes a BINAR(1) model where the innovation series follow a bivariate Poisson distribution under some non-stationary distributional assumptions. The method of generalized quasi-likelihood (GQL) is used to estimate the regression effects while the serial and bivariate correlations are estimated using a robust moment estimation technique. The application of model and estimation method is made in the simulated data. The GQL method is also compared with the CMLE, generalized method of moments (GMM) and generalized estimating equation (GEE) approaches where through simulation studies, it is shown that GQL yields more efficient estimates than GMM and equally or slightly more efficient estimates than CMLE and GEE. 相似文献

11.

On generalised estimating equations for vector regression

下载免费PDF全文

A. Huang 《Australian & New Zealand Journal of Statistics》2017,59(2):195-213

Generalised estimating equations (GEE) for regression problems with vector‐valued responses are examined. When the response vectors are of mixed type (e.g. continuous–binary response pairs), the GEE approach is a semiparametric alternative to full‐likelihood copula methods, and is closely related to Prentice & Zhao's mean‐covariance estimation equations approach. When the response vectors are of the same type (e.g. measurements on left and right eyes), the GEE approach can be viewed as a ‘plug‐in’ to existing methods, such as the vglm function from the state‐of‐the‐art VGAM package in R. In either scenario, the GEE approach offers asymptotically correct inferences on model parameters regardless of whether the working variance–covariance model is correctly or incorrectly specified. The finite‐sample performance of the method is assessed using simulation studies based on a burn injury dataset and a sorbinil eye trial dataset. The method is applied to data analysis examples using the same two datasets, as well as to a trivariate binary dataset on three plant species in the Hunua ranges of Auckland. 相似文献

12.

Bias from the use of generalized estimating equations to analyze incomplete longitudinal binary data

Andrew J. Copas Shaun R. Seaman 《Journal of applied statistics》2010,37(6):911-922

Patient dropout is a common problem in studies that collect repeated binary measurements. Generalized estimating equations (GEE) are often used to analyze such data. The dropout mechanism may be plausibly missing at random (MAR), i.e. unrelated to future measurements given covariates and past measurements. In this case, various authors have recommended weighted GEE with weights based on an assumed dropout model, or an imputation approach, or a doubly robust approach based on weighting and imputation. These approaches provide asymptotically unbiased inference, provided the dropout or imputation model (as appropriate) is correctly specified. Other authors have suggested that, provided the working correlation structure is correctly specified, GEE using an improved estimator of the correlation parameters (‘modified GEE’) show minimal bias. These modified GEE have not been thoroughly examined. In this paper, we study the asymptotic bias under MAR dropout of these modified GEE, the standard GEE, and also GEE using the true correlation. We demonstrate that all three methods are biased in general. The modified GEE may be preferred to the standard GEE and are subject to only minimal bias in many MAR scenarios but in others are substantially biased. Hence, we recommend the modified GEE be used with caution. 相似文献

13.

Assessing inter- and intra-agreement for dependent binary data: a Bayesian hierarchical correlation approach

Miao-Yu Tsai 《Journal of applied statistics》2012,39(1):173-187

Agreement measures are designed to assess consistency between different instruments rating measurements of interest. When the individual responses are correlated with multilevel structure of nestings and clusters, traditional approaches are not readily available to estimate the inter- and intra-agreement for such complex multilevel settings. Our research stems from conformity evaluation between optometric devices with measurements on both eyes, equality tests of agreement in high myopic status between monozygous twins and dizygous twins, and assessment of reliability for different pathologists in dysplasia. In this paper, we focus on applying a Bayesian hierarchical correlation model incorporating adjustment for explanatory variables and nesting correlation structures to assess the inter- and intra-agreement through correlations of random effects for various sources. This Bayesian generalized linear mixed-effects model (GLMM) is further compared with the approximate intra-class correlation coefficients and kappa measures by the traditional Cohen’s kappa statistic and the generalized estimating equations (GEE) approach. The results of comparison studies reveal that the Bayesian GLMM provides a reliable and stable procedure in estimating inter- and intra-agreement simultaneously after adjusting for covariates and correlation structures, in marked contrast to Cohen’s kappa and the GEE approach. 相似文献

14.

Pseudo-Likelihood Methodology for Hierarchical Count Data

George Kalema 《统计学通讯:理论与方法》2014,43(22):4790-4805

Generalized Estimating Equations (GEE) are a widespread tool for modeling correlated data, based on properly formulating a marginal regression function, combined with working assumptions about the correlation function. Should interest be placed in addition on the correlation function, then, apart from second-order GEE, pseudo-likelihood (PL) also provides an attractive alternative, especially in its pairwise form, where the covariance between each pair of the response vector is modeled as well. An elegant PL approach is formulated in this paper, based on a flexible bivariate Poisson model. The performance of the PL-method is studied, relative to GEE, using simulations. Data on repeated counts of epileptic seizures in a two-arm clinical trial are analyzed. A macro has been developed by the authors and made available on their web pages. 相似文献

15.

Assessment of modeling longitudinal binary data based on graphical methods

Kuo-Chin Lin Yi-Ju Chen 《统计学通讯:理论与方法》2017,46(7):3426-3437

Longitudinal categorical data are commonly applied in a variety of fields and are frequently analyzed by generalized estimating equation (GEE) method. Prior to making further inference based on the GEE model, the assessment of model fit is crucial. Graphical techniques have long been in widespread use for assessing the model adequacy. We develop alternative graphical approaches utilizing plots of marginal model-checking condition and local mean deviance to assess the GEE model with logit link for longitudinal binary responses. The applications of the proposed procedures are illustrated through two longitudinal binary datasets. 相似文献

16.

Effects of correlation and missing data on sample size estimation in longitudinal clinical trials

Song Zhang Chul Ahn 《Pharmaceutical statistics》2010,9(1):2-9

In longitudinal clinical trials, a common objective is to compare the rates of changes in an outcome variable between two treatment groups. Generalized estimating equation (GEE) has been widely used to examine if the rates of changes are significantly different between treatment groups due to its robustness to misspecification of the true correlation structure and randomly missing data. The sample size formula for repeated outcomes is based on the assumption of missing completely at random and a large sample approximation. A simulation study is conducted to investigate the performance of GEE sample size formula with small sample sizes, damped exponential family of correlation structure and non‐ignorable missing data. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

17.

Efficiency of generalized estimating equations for binary responses

N. Rao Chaganty Harry Joe 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2004,66(4):851-860

Summary. Using standard correlation bounds, we show that in generalized estimation equations (GEEs) the so-called 'working correlation matrix' R ( α ) for analysing binary data cannot in general be the true correlation matrix of the data. Methods for estimating the correlation param-eter in current GEE software for binary responses disregard these bounds. To show that the GEE applied on binary data has high efficiency, we use a multivariate binary model so that the covariance matrix from estimating equation theory can be compared with the inverse Fisher information matrix. But R ( α ) should be viewed as the weight matrix, and it should not be confused with the correlation matrix of the binary responses. We also do a comparison with more general weighted estimating equations by using a matrix Cauchy–Schwarz inequality. Our analysis leads to simple rules for the choice of α in an exchangeable or autoregressive AR(1) weight matrix R ( α ), based on the strength of dependence between the binary variables. An example is given to illustrate the assessment of dependence and choice of α . 相似文献

18.

A Two-Latent-Class Model for Smoking Cessation Data with Informative Dropouts

Li Qin Lisa A. Weissfeld Changyu Shen Michele D. Levine 《统计学通讯:理论与方法》2013,42(15):2604-2619

Non ignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. In our study, we propose a new two-latent-class model for categorical data with informative dropouts, dividing the observed data into two latent classes; one class in which the outcomes are deterministic and a second one in which the outcomes can be modeled using logistic regression. In the model, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Parameters are estimated by the method of maximum likelihood estimation based on the above assumptions and the tetrachoric correlation between responses within the same subject. We compare the proposed method with the shared parameter model and the weighted GEE model using the areas under the ROC curves in the simulations and the application to the smoking cessation data set. The simulation results indicate that the proposed two-latent-class model performs well under different missing procedures. The application results show that our proposed method is better than the shared parameter model and the weighted GEE model. 相似文献

19.

Comparison of GEE1 and GEE2 estimation applied to clustered logistic regression

《Journal of Statistical Computation and Simulation》2012,82(4):361-378

Generalized estimating equations (GEE) have become a popular method for marginal regression modelling of data that occur in clusters. Features of the GEE methodology are the use of a ‘working covariance’, an approximation to the underlying covariance, which is used to improve the efficiency in estimating the regression coefficients, and the ‘sandwich’ estimate of variance, which provides a way of consistently estimating their standard errors. These techniques have been extended to include estimating equations for the underlying correlation structure, both to improve the efficiency of the regression coefficient estimates and to provide estimates of correlations between units in a cluster, when these are of interest. If the mean structure is of primary interest, then a simpler set of equations (GEE1) can be used, whereas if the underlying covariance structure is of interest in its own right, the use of the more complex GEE2 estimating equations is often recommended. In this paper, we compare the effect of increasing the complexity of the ‘working covariances’ on the variance of the parameter estimates, as well as the mean-squared error of the ‘sandwich’ estimate of variance. We give asymptotic expressions for these variances and mean-squared error terms. We use these to study the behaviour of different variants of GEE1 and GEE2 when we change the number of clusters, the cluster size, and the within-cluster correlation. We conclude that the extra complexity of the full GEE2 approach is not usually justified if the mean structure is of primary interest. 相似文献

20.

A Weighting Approach for GEE Analysis with Missing Data

Cuiling Wang Myunghee Cho Paik 《统计学通讯:理论与方法》2013,42(13):2397-2411

We propose a new weighting (WT) method to handle missing categorical outcomes in longitudinal data analysis using generalized estimating equations (GEE). The proposed WT provides a valid GEE estimator when the data are missing at random (MAR), and has more stable weights and shows advantage in efficiency compared to the inverse probability weighing method in the presence of small observation probabilities. The WT estimator is similar to the stabilized weighting (SWT) estimator under mild conditions, but it is more stable and efficient than SWT when the associations of the outcome with the observation probabilities and the covariate are strong. 相似文献