We discuss and evaluate bootstrap algorithms for obtaining confidence intervals for parameters in Generalized Linear Models when the data are correlated. The methods are based on a stratified bootstrap and are suited to correlation occurring within “blocks” of data (e.g., individuals within a family, teeth within a mouth, etc.). Application of the intervals to data from a Dutch follow-up study on preterm infants shows the corroborative usefulness of the intervals, while the intervals are seen to be a powerful diagnostic in studying annual measles data. In a simulation study, we compare the coverage rates of the proposed intervals with existing methods (e.g., via Generalized Estimating Equations). In most cases, the bootstrap intervals are seen to perform better than current methods, and are produced in an automatic fashion, so that the user need not know (or have to guess) the dependence structure within a block.  相似文献   

Patient dropout is a common problem in studies that collect repeated binary measurements. Generalized estimating equations (GEE) are often used to analyze such data. The dropout mechanism may be plausibly missing at random (MAR), i.e. unrelated to future measurements given covariates and past measurements. In this case, various authors have recommended weighted GEE with weights based on an assumed dropout model, or an imputation approach, or a doubly robust approach based on weighting and imputation. These approaches provide asymptotically unbiased inference, provided the dropout or imputation model (as appropriate) is correctly specified. Other authors have suggested that, provided the working correlation structure is correctly specified, GEE using an improved estimator of the correlation parameters (‘modified GEE’) show minimal bias. These modified GEE have not been thoroughly examined. In this paper, we study the asymptotic bias under MAR dropout of these modified GEE, the standard GEE, and also GEE using the true correlation. We demonstrate that all three methods are biased in general. The modified GEE may be preferred to the standard GEE and are subject to only minimal bias in many MAR scenarios but in others are substantially biased. Hence, we recommend the modified GEE be used with caution.  相似文献   

Summary.  We present a multivariate logistic regression model for the joint analysis of longitudinal multiple-source binary data. Longitudinal multiple-source binary data arise when repeated binary measurements are obtained from two or more sources, with each source providing a measure of the same underlying variable. Since the number of responses on each subject is relatively large, the empirical variance estimator performs poorly and cannot be relied on in this setting. Two methods for obtaining a parsimonious within-subject association structure are considered. An additional complication arises with estimation, since maximum likelihood estimation may not be feasible without making unrealistically strong assumptions about third- and higher order moments. To circumvent this, we propose the use of a generalized estimating equations approach. Finally, we present an analysis of multiple-informant data obtained longitudinally from a psychiatric interventional trial that motivated the model developed in the paper.  相似文献   

Summary.  In a large, prospective longitudinal study designed to monitor cardiac abnormalities in children born to women who are infected with the human immunodeficiency virus, instead of a single outcome variable, there are multiple binary outcomes (e.g. abnormal heart rate, abnormal blood pressure and abnormal heart wall thickness) considered as joint measures of heart function over time. In the presence of missing responses at some time points, longitudinal marginal models for these multiple outcomes can be estimated by using generalized estimating equations (GEEs), and consistent estimates can be obtained under the assumption of a missingness completely at random mechanism. When the missing data mechanism is missingness at random, i.e. the probability of missing a particular outcome at a time point depends on observed values of that outcome and the remaining outcomes at other time points, we propose joint estimation of the marginal models by using a single modified GEE based on an EM-type algorithm. The method proposed is motivated by the longitudinal study of cardiac abnormalities in children who were born to women infected with the human immunodeficiency virus, and analyses of these data are presented to illustrate the application of the method. Further, in an asymptotic study of bias, we show that, under a missingness at random mechanism in which missingness depends on all observed outcome variables, our joint estimation via the modified GEE produces almost unbiased estimates, provided that the correlation model has been correctly specified, whereas estimates from standard GEEs can lead to substantial bias.  相似文献   

The author develops a robust quasi‐likelihood method, which appears to be useful for down‐weighting any influential data points when estimating the model parameters. He illustrates the computational issues of the method in an example. He uses simulations to study the behaviour of the robust estimates when data are contaminated with outliers, and he compares these estimates to those obtained by the ordinary quasi‐likelihood method.  相似文献   

Liang and Zeger (1986) proposed an extension of generalized linear models to the analysis of longitudinal data. In their formulation, a common dispersion parameter assumption across observation times is required. However, this assumption is not expected to hold in most situations. Park (1993) proposed a simple extension of Liang and Zeger's formulation to allow for different dispersion parameters for each time point. The proposed model is easy to apply without heavy computations and useful to handle the cases when variations in over-dispersion over time exist. In this paper, we focus on evaluating the effect of additional dispersion parameters on the estimators of model parameters. Through a Monte Carlo simulation study, efficiency of Park's method is compared with the Liang and Zeger's method.  相似文献   

Multivariate normal, due to its well-established theories, is commonly utilized to analyze correlated data of various types. However, the validity of the resultant inference is, more often than not, erroneous if the model assumption fails. We present a modification for making the multivariate normal likelihood acclimatize itself to general correlated data. The modified likelihood is asymptotically legitimate for any true underlying joint distributions so long as they have finite second moments. One can, hence, acquire full likelihood inference without knowing the true random mechanisms underlying the data. Simulations and real data analysis are provided to demonstrate the merit of our proposed parametric robust method.  相似文献   

We propose a flexible functional approach for modelling generalized longitudinal data and survival time using principal components. In the proposed model the longitudinal observations can be continuous or categorical data, such as Gaussian, binomial or Poisson outcomes. We generalize the traditional joint models that treat categorical data as continuous data by using some transformations, such as CD4 counts. The proposed model is data-adaptive, which does not require pre-specified functional forms for longitudinal trajectories and automatically detects characteristic patterns. The longitudinal trajectories observed with measurement error or random error are represented by flexible basis functions through a possibly nonlinear link function, combining dimension reduction techniques resulting from functional principal component (FPC) analysis. The relationship between the longitudinal process and event history is assessed using a Cox regression model. Although the proposed model inherits the flexibility of non-parametric methods, the estimation procedure based on the EM algorithm is still parametric in computation, and thus simple and easy to implement. The computation is simplified by dimension reduction for random coefficients or FPC scores. An iterative selection procedure based on Akaike information criterion (AIC) is proposed to choose the tuning parameters, such as the knots of spline basis and the number of FPCs, so that appropriate degree of smoothness and fluctuation can be addressed. The effectiveness of the proposed approach is illustrated through a simulation study, followed by an application to longitudinal CD4 counts and survival data which were collected in a recent clinical trial to compare the efficiency and safety of two antiretroviral drugs.  相似文献   

The generalized estimating equations (GEE) introduced by Liang and Zeger (Biometrika 73 (1986) 13–22) have been widely used over the past decade to analyze longitudinal data. The method uses a generalized quasi-score function estimate for the regression coefficients, and moment estimates for the correlation parameters. Recently, Crowder (Biometrika 82 (1995) 407–410) has pointed out some pitfalls with the estimation of the correlation parameters in the GEE method. In this paper we present a new method for estimating the correlation parameters which overcomes those pitfalls. For some commonly assumed correlation structures, we obtain unique feasible estimates for the correlation parameters. Large sample properties of our estimates are also established.  相似文献   

Longitudinal data often contain missing observations, and it is in general difficult to justify particular missing data mechanisms, whether random or not, that may be hard to distinguish. The authors describe a likelihood‐based approach to estimating both the mean response and association parameters for longitudinal binary data with drop‐outs. They specify marginal and dependence structures as regression models which link the responses to the covariates. They illustrate their approach using a data set from the Waterloo Smoking Prevention Project They also report the results of simulation studies carried out to assess the performance of their technique under various circumstances.  相似文献   

We propose several diagnostic methods for checking the adequacy of marginal regression models for analyzing correlated binary data. We use a parametric marginal model based on latent variables and derive the projection (hat) matrix, Cook's distance, various residuals and Mahalanobis distance between the observed binary responses and the estimated probabilities for a cluster. Emphasized are several graphical methods including the simulated Q-Q plot, the half-normal probability plot with a simulated envelope, and the partial residual plot. The methods are illustrated with a real life example.  相似文献   

Summary.  Using standard correlation bounds, we show that in generalized estimation equations (GEEs) the so-called 'working correlation matrix' R ( α ) for analysing binary data cannot in general be the true correlation matrix of the data. Methods for estimating the correlation param-eter in current GEE software for binary responses disregard these bounds. To show that the GEE applied on binary data has high efficiency, we use a multivariate binary model so that the covariance matrix from estimating equation theory can be compared with the inverse Fisher information matrix. But R ( α ) should be viewed as the weight matrix, and it should not be confused with the correlation matrix of the binary responses. We also do a comparison with more general weighted estimating equations by using a matrix Cauchy–Schwarz inequality. Our analysis leads to simple rules for the choice of α in an exchangeable or autoregressive AR(1) weight matrix R ( α ), based on the strength of dependence between the binary variables. An example is given to illustrate the assessment of dependence and choice of α .  相似文献   

Although Fan showed that the mixed-effects model for repeated measures (MMRM) is appropriate to analyze complete longitudinal binary data in terms of the rate difference, they focused on using the generalized estimating equations (GEE) to make statistical inference. The current article emphasizes validity of the MMRM when the normal-distribution-based pseudo likelihood approach is used to make inference for complete longitudinal binary data. For incomplete longitudinal binary data with missing at random missing mechanism, however, the MMRM, using either the GEE or the normal-distribution-based pseudo likelihood inferential procedure, gives biased results in general and should not be used for analysis.  相似文献   

Paired binary data arise frequently in biomedical studies with unique features of their own. For instance, in clinical studies involving pairs such as ears, eyes etc., often both the intrapair association parameter and the event probability are of interest. In addition, we may be interested in the dependence of the association parameter on certain covariates as well. Although various methods have been proposed to model paired binary data, this paper proposes a unified approach for estimating various intrapair measures under a generalized linear model with simultaneous maximum likelihood estimates of the marginal probabilities and the intrapair association. The methods are illustrated with a twin morbidity study.  相似文献   

Joint models for longitudinal and time-to-event data have been applied in many different fields of statistics and clinical studies. However, the main difficulty these models have to face with is the computational problem. The requirement for numerical integration becomes severe when the dimension of random effects increases. In this paper, a modified two-stage approach has been proposed to estimate the parameters in joint models. In particular, in the first stage, the linear mixed-effects models and best linear unbiased predictorsare applied to estimate parameters in the longitudinal submodel. In the second stage, an approximation of the fully joint log-likelihood is proposed using the estimated the values of these parameters from the longitudinal submodel. Survival parameters are estimated bymaximizing the approximation of the fully joint log-likelihood. Simulation studies show that the approach performs well, especially when the dimension of random effects increases. Finally, we implement this approach on AIDS data.  相似文献   

Many medical applications are interested to know the disease status. The disease status can be related to multiple serial measurements. Nevertheless, owing to various reasons, the binary outcome can be measured incorrectly. The estimators derived from the misspecified outcome can be biased. This paper derives the complete data likelihood function to incorporate both the multiple serial measurements and the misspecified outcome. Owing to the latent variables, EM algorithm is used to derive the maximum-likelihood estimators. Monte Carlo simulations are conducted to compare the impact of misspecification on the estimates. A retrospective data for the recurrence of atrial fibrillation is used to illustrate the usage of the proposed model.  相似文献   


Extra-binomial variation in longitudinal/clustered binomial data is frequently observed in biomedical and observational studies. The usual generalized estimating equations method treats the extra-binomial parameter as a constant across all subjects. In this paper, a two-parameter variance function modelling the extraneous variance is proposed to account for heterogeneity among subjects. The new approach allows modelling the extra-binomial variation as a function of the mean and binomial size.  相似文献   

A fully parametric first-order autoregressive (AR(1)) model is proposed to analyse binary longitudinal data. By using a discretized version of a copula, the modelling approach allows one to construct separate models for the marginal response and for the dependence between adjacent responses. In particular, the transition model that is focused on discretizes the Gaussian copula in such a way that the marginal is a Bernoulli distribution. A probit link is used to take into account concomitant information in the behaviour of the underlying marginal distribution. Fixed and time-varying covariates can be included in the model. The method is simple and is a natural extension of the AR(1) model for Gaussian series. Since the approach put forward is likelihood-based, it allows interpretations and inferences to be made that are not possible with semi-parametric approaches such as those based on generalized estimating equations. Data from a study designed to reduce the exposure of children to the sun are used to illustrate the methods.  相似文献   

Longitudinal or clustered response data arise in many applications such as biostatistics, epidemiology and environmental studies. The repeated responses cannot in general be assumed to be independent. One method of analysing such data is by using the generalized estimating equations (GEE) approach. The current GEE method for estimating regression effects in longitudinal data focuses on the modelling of the working correlation matrix assuming a known variance function. However, correct choice of the correlation structure may not necessarily improve estimation efficiency for the regression parameters if the variance function is misspecified [Wang YG, Lin X. Effects of variance-function misspecification in analysis of longitudinal data. Biometrics. 2005;61:413–421]. In this connection two problems arise: finding a correct variance function and estimating the parameters of the chosen variance function. In this paper, we study the problem of estimating the parameters of the variance function assuming that the form of the variance function is known and then the effect of a misspecified variance function on the estimates of the regression parameters. We propose a GEE approach to estimate the parameters of the variance function. This estimation approach borrows the idea of Davidian and Carroll [Variance function estimation. J Amer Statist Assoc. 1987;82:1079–1091] by solving a nonlinear regression problem where residuals are regarded as the responses and the variance function is regarded as the regression function. A limited simulation study shows that the proposed method performs at least as well as the modified pseudo-likelihood approach developed by Wang and Zhao [A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics. 2007;63:681–689]. Both these methods perform better than the GEE approach.  相似文献   

The author introduces robust techniques for estimation, inference and variable selection in the analysis of longitudinal data. She first addresses the problem of the robust estimation of the regression and nuisance parameters, for which she derives the asymptotic distribution. She uses weighted estimating equations to build robust quasi‐likelihood functions. These functions are then used to construct a class of test statistics for variable selection. She derives the limiting distribution of these tests and shows its robustness properties in terms of stability of the asymptotic level and power under contamination. An application to a real data set allows her to illustrate the benefits of a robust analysis.  相似文献   

