首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Generating correlated binary data with specified marginal probabilities and correlation structure is often needed and useful in simulation studies to investigate the finite sample performance of statistical methods. Conditional linear family provides a powerful and flexible tool to generate correlated matched-pair binary data including the physician–patients and clustered match-pair data. To ensure the validity of the data generation process, constraints for parameters of the conditional linear family are needed. For the correlated matched-pair binary data with an exchangeable-type correlation structure, we derive the explicit expressions to check these constraints and it provides an efficient and convenient computational tool in validating the data generation process. The results are applied to check the constraints for two typical correlated matched-pair binary data.  相似文献   

2.
Longitudinal studies of a binary outcome are common in the health, social, and behavioral sciences. In general, a feature of random effects logistic regression models for longitudinal binary data is that the marginal functional form, when integrated over the distribution of the random effects, is no longer of logistic form. Recently, Wang and Louis (2003) proposed a random intercept model in the clustered binary data setting where the marginal model has a logistic form. An acknowledged limitation of their model is that it allows only a single random effect that varies from cluster to cluster. In this paper, we propose a modification of their model to handle longitudinal data, allowing separate, but correlated, random intercepts at each measurement occasion. The proposed model allows for a flexible correlation structure among the random intercepts, where the correlations can be interpreted in terms of Kendall's τ. For example, the marginal correlations among the repeated binary outcomes can decline with increasing time separation, while the model retains the property of having matching conditional and marginal logit link functions. Finally, the proposed method is used to analyze data from a longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women.  相似文献   

3.
A marginal regression approach for correlated censored survival data has become a widely used statistical method. Examples of this approach in survival analysis include from the early work by Wei et al. (J Am Stat Assoc 84:1065–1073, 1989) to more recent work by Spiekerman and Lin (J Am Stat Assoc 93:1164–1175, 1998). This approach is particularly useful if a covariate’s population average effect is of primary interest and the correlation structure is not of interest or cannot be appropriately specified due to lack of sufficient information. In this paper, we consider a semiparametric marginal proportional hazard mixture cure model for clustered survival data with a surviving or “cure” fraction. Unlike the clustered data in previous work, the latent binary cure statuses of patients in one cluster tend to be correlated in addition to the possible correlated failure times among the patients in the cluster who are not cured. The complexity of specifying appropriate correlation structures for the data becomes even worse if the potential correlation between cure statuses and the failure times in the cluster has to be considered, and thus a marginal regression approach is particularly attractive. We formulate a semiparametric marginal proportional hazards mixture cure model. Estimates are obtained using an EM algorithm and expressions for the variance–covariance are derived using sandwich estimators. Simulation studies are conducted to assess finite sample properties of the proposed model. The marginal model is applied to a multi-institutional study of local recurrences of tonsil cancer patients who received radiation therapy. It reveals new findings that are not available from previous analyses of this study that ignored the potential correlation between patients within the same institution.  相似文献   

4.
This article describes a method for simulating n-dimensional multivariate non-normal data, with emphasis on count-valued data. Dependence is characterized by either Pearson correlations or Spearman correlations. The simulation is accomplished by simulating a vector of correlated standard normal variates. The elements of this vector are then transformed to achieve the target marginal distributions. We prove that the method corresponds to simulating data from a multivariate Gaussian copula. The simulation method does not restrict pairwise dependence beyond the limits imposed by the marginal distributions and can achieve any Pearson or Spearman correlation within those limits. Two examples are included. In the first example, marginal means, variances, Pearson correlations, and Spearman correlations are estimated from the epileptic seizure data set of Diggle et al. [P. Diggle, P. Heagerty, K.Y. Liang, and S. Zeger, Analysis of Longitudinal Data, Oxford University Press, Oxford, 2002]. Data with these means and variances are simulated to first achieve the estimated Pearson correlations and then achieve the estimated Spearman correlations. The second example is of a hypothetical time series of Poisson counts with seasonal mean ranging between 1 and 9 and an autoregressive(1) dependence structure.  相似文献   

5.
Examples are given of the need for simulating correlated binary variates with different given marginal expectations and pairwise correlations. An algorithm is then presented for generating such variates. The algorithm may be used to generate variates of any dimension.  相似文献   

6.
Dependence in outcome variables may pose formidable difficulty in analyzing data in longitudinal studies. In the past, most of the studies made attempts to address this problem using the marginal models. However, using the marginal models alone, it is difficult to specify the measures of dependence in outcomes due to association between outcomes as well as between outcomes and explanatory variables. In this paper, a generalized approach is demonstrated using both the conditional and marginal models. This model uses link functions to test for dependence in outcome variables. The estimation and test procedures are illustrated with an application to the mobility index data from the Health and Retirement Survey and also simulations are performed for correlated binary data generated from the bivariate Bernoulli distributions. The results indicate the usefulness of the proposed method.  相似文献   

7.
Correlated binary data arise frequently in medical as well as other scientific disciplines; and statistical methods, such as generalized estimating equation (GEE), have been widely used for their analysis. The need for simulating correlated binary variates arises for evaluating small sample properties of the GEE estimators when modeling such data. Also, one might generate such data to simulate and study biological phenomena such as tooth decay or periodontal disease. This article introduces a simple method for generating pairs of correlated binary data. A simple algorithm is also provided for generating an arbitrary dimensional random vector of non-negatively correlated binary variates. The method relies on the idea that correlations among the random variables arise as a result of their sharing some common components that induce such correlations. It then uses some properties of the binary variates to represent each variate in terms of these common components in addition to its own elements. Unlike most previous approaches that require solving nonlinear equations or use some distributional properties of other random variables, this method uses only some properties of the binary variate. As no intermediate random variables are required for generating the binary variates, the proposed method is shown to be faster than the other methods. To verify this claim, we compare the computational efficiency of the proposed method with those of other procedures.  相似文献   

8.
The correlation structure imposed on multivariate time to event data is often of a simple nature, such as in the shared frailty model where pairwise correlations between event times in a cluster are all the same. In modeling the infection times of the four udder quarters clustered within the cow, more complex correlation structures are possibly required, and if so, such more complex correlation structures give more insight in the infection process. In this article, we will choose a marginal approach to study more complex correlation structures, therefore leaving the modeling of marginal distributions unaffected by the association parameters. The dependency of failure times will be induced through copula functions. The methods are shown for (mixtures of) the Clayton copula, but can be generalized to mixtures of Archimedean copulas for which the nesting conditions are met (McNeil in J Stat Comput Simul 6:567–581, 2008; Hofert in Comput Stat Data Anal 55:57–70, 2011).  相似文献   

9.
This paper presents the results of a small sample simulation study designed to evaluate the performance of a recently proposed test statistic for the analysis of correlated binary data. The new statistic is an adjusted Mantel-Haenszel test, which may be used in testing for association between a binary exposure and a binary outcome of interest across several fourfold tables when the data have been collected under a cluster sampling design. Al- though originally developed for the analysis of periodontal data, the proposed method may be applied to clustered binary data arising in a variety of settings, including longitu- dinal studies, family studies, and school-based research. The features of the simulation are intended to mimic those of a research study of periodontal health, in which a large number of observations is made on each of a relatively small number of patients. The simulation reveals that the adjusted test statistic performs well in finite samples, having empirical type I error rates close to nominal and empirical power similar to that of more complicated marginal regression methods. Software for computing the adjusted statistic is also provided.  相似文献   

10.
In this article, random number generation algorithms for generating bivariate uniform data based on a known class of symmetric bivariate uniform distributions that allow the entire correlation range are given, and its previously unrecognized connection with bivariate binary data is established via matching the cumulative distribution functions.  相似文献   

11.
Scientific experiments commonly result in clustered discrete and continuous data. Existing methods for analyzing such data include the use of quasi-likelihood procedures and generalized estimating equations to estimate marginal mean response parameters. In applications to areas such as developmental toxicity studies, where discrete and continuous measurements are recorded on each fetus, or clinical ophthalmologic trials, where different types of observations are made on each eye, the assumption that data within cluster are exchangeable is often very reasonable. We use this assumption to formulate fully parametric regression models for clusters of bivariate data with binary and continuous components. The regression models proposed have marginal interpretations and reproducible model structures. Tractable expressions for likelihood equations are derived and iterative schemes are given for computing efficient estimates (MLEs) of the marginal mean, correlations, variances and higher moments. We demonstrate the use the ‘exchangeable’ procedure with an application to a developmental toxicity study involving fetal weight and malformation data.  相似文献   

12.
We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion.  相似文献   

13.
ABSTRACT

Data sets originating from wide range of research studies are composed of multiple variables that are correlated and of dissimilar types, primarily of count, binary/ordinal and continuous attributes. The present paper builds on the previous works on multivariate data generation and develops a framework for generating multivariate mixed data with a pre-specified correlation matrix. The generated data consist of components that are marginally count, binary, ordinal and continuous, where the count and continuous variables follow the generalized Poisson and normal distributions, respectively. The use of the generalized Poisson distribution provides a flexible mechanism which allows under- and over-dispersed count variables generally encountered in practice. A step-by-step algorithm is provided and its performance is evaluated using simulated and real-data scenarios.  相似文献   

14.
Regression diagnostics are introduced for parameters in marginal association models for clustered binary outcomes in an implementation of generalized estimating equations. Estimating equations for intracluster correlations facilitate computational formulae for one-step deletion diagnostics in an extension of earlier work on diagnostics for parameters in the marginal mean model. The proposed diagnostics measure the influence of an observation or a cluster of observations on the estimated regression parameters and on the overall fit of the model. The diagnostics are applied to data from four research studies from public health and medicine.  相似文献   

15.
Summary.  We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses.  相似文献   

16.
Abstract

The generalized linear mixed model (GLMM) is commonly used for the analysis of hierarchical non Gaussian data. It combines an exponential family model formulation with normally distributed random effects. A drawback is the difficulty of deriving convenient marginal mean functions with straightforward parametric interpretations. Several solutions have been proposed, including the marginalized multilevel model (directly formulating the marginal mean, together with a hierarchical association structure) and the bridging approach (choosing the random-effects distribution such that marginal and hierarchical mean functions share functional forms). Another approach, useful in both a Bayesian and a maximum-likelihood setting, is to choose a random-effects distribution that is conjugate to the outcome distribution. In this paper, we contrast the bridging and conjugate approaches. For binary outcomes, using characteristic functions and cumulant generating functions, it is shown that the bridge distribution is unique. Self-bridging is introduced as the situation in which the outcome and random-effects distributions are the same. It is shown that only the Gaussian and degenerate distributions have well-defined cumulant generating functions for which self-bridging holds.  相似文献   

17.
When modeling correlated binary data in the presence of informative cluster sizes, generalized estimating equations with either resampling or inverse-weighting, are often used to correct for estimation bias. However, existing methods for the clustered longitudinal setting assume constant cluster sizes over time. We present a subject-weighted generalized estimating equations scheme that provides valid parameter estimation for the clustered longitudinal setting while allowing cluster sizes to change over time. We compare, via simulation, the performance of existing methods to our subject-weighted approach. The subject-weighted approach was the only method that showed negligible bias, with excellent coverage, for all model parameters.  相似文献   

18.
This work aims at investigating marginal correlation within and between longitudinal data sequences. Useful and intuitive approximate expressions are derived based on generalized linear mixed models. Data from four double-blind randomized clinical trials are used to estimate the intra-class coefficient of reliability for a binary response. Additionally, the correlation between such a binary response and a continuous response is derived to evaluate the criterion validity of the binary response variable and the established continuous response variable.  相似文献   

19.
A method for inducing a desired rank correlation matrix on multivariate input vectors for simulation studies has recently been developed by Iman and Conover (1982). The primary intention of this procedure is to produce correlated input variables for use with computer models. Since this procedure is distribution free and allows the exact marginal distributions to remain intact it can be used with any marginal distributions for which it is reasonable to think in terms of correlation. In this paper we present a series of rank correlation plots based on this procedure when the marginal distributions are normal, lognormal, uniform and loguniform. These plots provide a convenient tool both for aiding the modeler in determining the degree of dependence among input variables (rather than guessing) and for communicating with the modeler the effect of different correlation assumptions. In addition this procedure can be used with sample multivariate data by sampling directly from the respective marginal empirical distribution functions.  相似文献   

20.
Data collection process in most observational and experimental studies yield different types of variables, leading to the use of joint models that are capable of handling multiple data types. Evaluation of various statistical techniques that have been developed for mixed data in simulated environments requires concurrent generation of multiple variables. In this article, I present an important augmentation to a unified framework proposed in our previously published work for simultaneously generating binary and nonnormal continuous data given the marginal characteristics and correlation structure, via fifth-order power polynomials that are known to extend the area covered in the skewness-elongation plane and to provide a better approximation to the probability density function of the continuous variables. I evaluate how well the improved methodology performs in comparison to the original one, in a simulated setting with illustrations of algorithmic steps. Although the relative gains for the associational quantities are not substantial, the augmented version appears to better capture the marginal quantities that are pertinent to the higher-order moments, as indicated by very close resemblance between the specified and empirically computed quantities on average.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号