期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Serial correlation or random subject effects

《统计学通讯:模拟与计算》2013,42(3):1105-1123

In longitudinal data analysis with random subject effects, there is often within subject serial correlation and possibly unequally spaced observations. This serial correlation can be partially confounded with the random between subject effects. In real data, it is often not clear whether there is serial correlation, random subject effects or both. Using inference based on the likelihood function, it is not always possible to identify the correct model, especially in small samples. However, it is important that some effort be made to attempt to find a good model rather than just making assumptions. This often means trying models with random coefficients, with serial correlation, and with both. Model selection criteria such as likelihood ratio tests and Akaike's Information Criterion (AIC) can be used. The problem of modelling serial correlation with unequally spaced observations is addressed. A real data example is presented where there is an apparent heterogeneity of variances, possible serial correlation and between subject random effects. In this example, it turns out that the random subject effects explains both the serial correlation and the variance heterogeneity. 相似文献

2.

Comparing alternating logistic regressions to other approaches to modelling correlated binary data

《Journal of Statistical Computation and Simulation》2012,82(10):2059-2071

Alternating logistic regressions (ALRs) seem to offer some of the advantages of marginal models estimated via generalized estimating equations (GEE) and generalized linear mixed models (GLMMs). Via simulation study we compared ALRs to marginal models estimated via GEE and subject-specific models estimated via GLMMs, with a focus on estimation of the correlation structure in three-level data sets (e.g. students in classes in schools). Data set size and structure, and amount of correlation in the data sets were varied. For simple correlation structures, ALRs performed well. For three-level correlation structures, all approaches, but especially ALRs, had difficulty assigning the correlation to the correct level, though sample sizes used were small. In addition, ALRs and GEEs had trouble attaching correct inference to the mean effects, though this improved as overall sample size improved. ALRs are a valuable addition to the data analyst's toolkit, though care should be taken when modelling data with three-level structures. 相似文献

3.

Markov chain models for multivariate repeated binary data analysis

Wei Tian Stewart J. Anderson 《统计学通讯:模拟与计算》2013,42(4):1001-1019

Repeated categorical outcomes frequently occur in clinical trials. Muenz and Rubinstein (1985) presented Markov chain models to analyze binary repeated data in a breast cancer study. We extend their method to the setting when more than one repeated outcome variable is of interest. In a randomized clinical trial of breast cancer, we investigate the dependency of toxicities on predictor variables and the relationship among multiple toxic effects. 相似文献

4.

The Block Empirical Likelihood Method of the Semivarying Coefficient Model with Application to Longitudinal Data

Xuemei Hu 《统计学通讯:理论与方法》2013,42(8):1342-1351

In this article, we consider a semivarying coefficient model with application to longitudinal data. In order to accommodate the within-group correlation, we apply the block empirical likelihood procedure to semivarying coefficient longitudinal data model, and prove a nonparametric version of Wilks' theorem which can be used to construct the block empirical likelihood confidence region with asymptotically correct coverage probability for the parametric component. In comparison with normal approximations, the proposed method does not require a consistent estimator for the asymptotic covariance matrix, making it easier to conduct inference for the model's parametric component. Simulations demonstrate how the proposed method works. 相似文献

5.

Unified Inference for Sparse and Dense Longitudinal Data in Time‐varying Coefficient Models

下载免费PDF全文

Yixin Chen Weixin Yao 《Scandinavian Journal of Statistics》2017,44(1):268-284

Time‐varying coefficient models are widely used in longitudinal data analysis. These models allow the effects of predictors on response to vary over time. In this article, we consider a mixed‐effects time‐varying coefficient model to account for the within subject correlation for longitudinal data. We show that when kernel smoothing is used to estimate the smooth functions in time‐varying coefficient models for sparse or dense longitudinal data, the asymptotic results of these two situations are essentially different. Therefore, a subjective choice between the sparse and dense cases might lead to erroneous conclusions for statistical inference. In order to solve this problem, we establish a unified self‐normalized central limit theorem, based on which a unified inference is proposed without deciding whether the data are sparse or dense. The effectiveness of the proposed unified inference is demonstrated through a simulation study and an analysis of Baltimore MACS data. 相似文献

6.

Assessing conditional independence for log-linear poisson models with random effects

Peter X.k Song Wenxin Jiang 《统计学通讯:理论与方法》2013,42(5-6):1233-1245

In the context of regression rnodels with random effects, repeated response are traditionally assumed to be mutually independent conditional on the random effects. In order to asseess the validity of such an assumption and its impact on parameter inference, we propose an estimating equation methodology where both random eifects and within-subject correlation are modeled. This fllows a subsequent analysis on the statistical sianificance of the conditional correlation. We illustrate this method with the epilepsy data of Thall and Vail (1990), and find our method useh in a proper representation for khe random effect modeling. 相似文献

7.

The effects of missing serial effects and/or heteroscedastic errors on mixed models using repeated growth data

《Journal of Statistical Computation and Simulation》2012,82(16):3367-3382

When a two-level multilevel model (MLM) is used for repeated growth data, the individuals constitute level 2 and the successive measurements constitute level 1, which is nested within the individuals that make up level 2. The heterogeneity among individuals is represented by either the random-intercept or random-coefficient (slope) model. The variance components at level 1 involve serial effects and measurement errors under constant variance or heteroscedasticity. This study hypothesizes that missing serial effects or/and heteroscedasticity may bias the results obtained from two-level models. To illustrate this effect, we conducted two simulation studies, where the simulated data were based on the characteristics of an empirical mouse tumour data set. The results suggest that for repeated growth data with constant variance (measurement error) and misspecified serial effects (ρ > 0.3), the proportion of level-2 variation (intra-class correlation coefficient) increases with ρ and the two-level random-coefficient model is the minimum AIC (or AIC_c) model when compared with the fixed model, heteroscedasticity model, and random-intercept model. In addition, the serial effect (ρ > 0.1) and heteroscedasticity are both misspecified, implying that the two-level random-coefficient model is the minimum AIC (or AIC_c) model when compared with the fixed model and random-intercept model. This study demonstrates that missing serial effects and/or heteroscedasticity may indicate heterogeneity among individuals in repeated growth data (mixed or two-level MLM). This issue is critical in biomedical research. 相似文献

8.

Conditional mix-GEE models for longitudinal data with unspecified random-effects distributions

Yanchun Xing Lili Xu Zhichuan Zhu 《统计学通讯:理论与方法》2018,47(4):862-876

In the longitudinal studies, the mixture generalized estimation equation (mix-GEE) was proposed to improve the efficiency of the fixed-effects estimator for addressing the working correlation structure misspecification. When the subject-specific effect is one of interests, mixed-effects models were widely used to analyze longitudinal data. However, most of the existing approaches assume a normal distribution for the random effects, and this could affect the efficiency of the fixed-effects estimator. In this article, a conditional mixture generalized estimating equation (cmix-GEE) approach based on the advantage of mix-GEE and conditional quadratic inference function (CQIF) method is developed. The advantage of our new approach is that it does not require the normality assumption for random effects and can accommodate the serial correlation between observations within the same cluster. The feature of our proposed approach is that the estimators of the regression parameters are more efficient than CQIF even if the working correlation structure is not correctly specified. In addition, according to the estimates of some mixture proportions, the true working correlation matrix can be identified. We establish the asymptotic results for the fixed-effects parameter estimators. Simulation studies were conducted to evaluate our proposed method. 相似文献

9.

Regularization in dynamic random-intercepts models for analysis of longitudinal data

Amir-Abbas Mofidian Naieni Reyhaneh Rikhtehgaran 《Scandinavian Journal of Statistics》2023,50(2):513-549

This paper addresses the problem of simultaneous variable selection and estimation in the random-intercepts model with the first-order lag response. This type of model is commonly used for analyzing longitudinal data obtained through repeated measurements on individuals over time. This model uses random effects to cover the intra-class correlation, and the first lagged response to address the serial correlation, which are two common sources of dependency in longitudinal data. We demonstrate that the conditional likelihood approach by ignoring correlation among random effects and initial responses can lead to biased regularized estimates. Furthermore, we demonstrate that joint modeling of initial responses and subsequent observations in the structure of dynamic random-intercepts models leads to both consistency and Oracle properties of regularized estimators. We present theoretical results in both low- and high-dimensional settings and evaluate regularized estimators' performances by conducting simulation studies and analyzing a real dataset. Supporting information is available online. 相似文献

10.

Bias from the use of generalized estimating equations to analyze incomplete longitudinal binary data

Andrew J. Copas Shaun R. Seaman 《Journal of applied statistics》2010,37(6):911-922

Patient dropout is a common problem in studies that collect repeated binary measurements. Generalized estimating equations (GEE) are often used to analyze such data. The dropout mechanism may be plausibly missing at random (MAR), i.e. unrelated to future measurements given covariates and past measurements. In this case, various authors have recommended weighted GEE with weights based on an assumed dropout model, or an imputation approach, or a doubly robust approach based on weighting and imputation. These approaches provide asymptotically unbiased inference, provided the dropout or imputation model (as appropriate) is correctly specified. Other authors have suggested that, provided the working correlation structure is correctly specified, GEE using an improved estimator of the correlation parameters (‘modified GEE’) show minimal bias. These modified GEE have not been thoroughly examined. In this paper, we study the asymptotic bias under MAR dropout of these modified GEE, the standard GEE, and also GEE using the true correlation. We demonstrate that all three methods are biased in general. The modified GEE may be preferred to the standard GEE and are subject to only minimal bias in many MAR scenarios but in others are substantially biased. Hence, we recommend the modified GEE be used with caution. 相似文献

11.

Permutation Methods for Comparing the Accuracy of Nested Prediction Models in Survival Analysis

Wenyu Jiang Nathalie C. Moon Bingshu E. Chen Dongsheng Tu 《统计学通讯:模拟与计算》2016,45(8):2691-2708

When making patient-specific prediction, it is important to compare prediction models to evaluate the gain in prediction accuracy for including additional covariates. We propose two statistical testing methods, the complete data permutation (CDP) and the permutation cross-validation (PCV) for comparing prediction models. We simulate clinical trial settings extensively and show that both methods are robust and achieve almost correct test sizes; the methods have comparable power in moderate to large sample situations, while the CDP is more efficient in computation. The methods are also applied to ovarian cancer clinical trial data. 相似文献

12.

纵向部分线性变系数EV模型的估计

赵明涛许晓丽《统计研究》2019,36(10):115-128

纵向数据是随着时间变化对个体进行重复观测而得到的一种相关性数据,广泛出现在诸多科学研究领域。在对个体进行观测时,测量误差不可避免,忽略测量误差往往会导致有偏估计。本文利用二次推断函数方法研究关于纵向数据的参数部分和非参数部分协变量均含有测量误差的部分线性变系数测量误差(errors-in-variables, EV)模型的估计问题。利用B样条逼近模型中的未知系数函数,构造关于回归参数和B样条系数的偏差修正的二次推断函数以处理个体内相关性和测量误差,得到回归参数和变系数的偏差修正的二次推断函数估计,然后证明了估计方法和结果的渐近性质。数值模拟和实例数据分析结果显示本文提出的方法具有一定的实用价值。相似文献

13.

Testing serial correlation in partially linear models with validation data

Feng Liu Sitong Guo Xinmei Kang 《统计学通讯:理论与方法》2017,46(19):9795-9806

This article investigates the testing for serial correlation in partially linear models with validation data and applies the empirical likelihood methods to construct serial tests statistics, and then we derive the asymptotic distribution of the test statistics under null hypothesis. Simulation results show that our method performs well. 相似文献

14.

A measure of disclosure risk for microdata

C. J. Skinner M. J. Elliot 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(4):855-867

Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two 'similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data. 相似文献

15.

Valid estimates for repeated randomized response methods

Heiko Groenitz 《Journal of applied statistics》2017,44(16):2994-3010

Surveys with sensitive characteristics (e.g. cheating in exams, fiscal evasion, social fraud, insurance fraud, discrimination, political views, financial situation) need special concepts, because normal direct questioning causes answer refusal and lies. One well-established concept is the randomized response (RR) approach. RRs protect the interviewees' privacy and facilitate their cooperation. Based on the RRs of many persons, inference is possible. A recently published article suggests two repeated RR methods. That is, each interviewee must give more than one answer. Repeated RRs are a good idea to improve the estimation efficiency of RR techniques. However, this recently published article contains serious mistakes and derives invalid estimates. For this reason, we correct these mistakes and develop valid estimates in the first part of our article. Subsequently, in the second part, we present generalized considerations that cover many more repeated RR schemes. 相似文献

16.

Analysis of longitudinal data unbalanced over time

Wenzheng Huang Garrett M. Fitzmaurice 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(1):135-155

Summary. The paper considers modelling, estimating and diagnostically verifying the response process generating longitudinal data, with emphasis on association between repeated meas-ures from unbalanced longitudinal designs. Our model is based on separate specifications of the moments for the mean, standard deviation and correlation, with different components possibly sharing common parameters. We propose a general class of correlation structures that comprise random effects, measurement errors and a serially correlated process. These three elements are combined via flexible time-varying weights, whereas the serial correlation can depend flexibly on the mean time and lag. When the measurement schedule is independent of the response process, our estimation procedure yields consistent and asymptotically normal estimates for the mean parameters even when the standard deviation and correlation are misspecified, and for the standard deviation parameters even when the correlation is misspecified. A generic diagnostic method is developed for verifying the models for the mean, standard deviation and, in particular, the correlation, which is applicable even when the data are severely unbalanced. The methodology is illustrated by an analysis of data from a longitudinal study that was designed to characterize pulmonary growth in girls. 相似文献

17.

Robust Correlation Structure for Multivariate Failure Time Data

M. Tariqul Hasan Gary Sneddon 《统计学通讯:模拟与计算》2013,42(9):1839-1854

When incomplete repeated failure times are collected from a large number of independent individuals, interest is focused primarily on the consistent and efficient estimation of the effects of the associated covariates on the failure times. Since repeated failure times are likely to be correlated, it is important to exploit the correlation structure of the failure data in order to obtain such consistent and efficient estimates. However, it may be difficult to specify an appropriate correlation structure for a real life data set. We propose a robust correlation structure that can be used irrespective of the true correlation structure. This structure is used in constructing an estimating equation for the hazard ratio parameter, under the assumption that the number of repeated failure times for an individual is random. The consistency and efficiency of the estimates is examined through a simulation study, where we consider failure times that marginally follow an exponential distribution and a Poisson distribution is assumed for the random number of repeated failure times. We conclude by using the proposed method to analyze a bladder cancer dataset. 相似文献

18.

Generalized estimating equations by considering additive terms for analyzing time-course gene sets data

T. Baghfalaki M. Ganjali D. Berridge 《Journal of the Korean Statistical Society》2018,47(4):423-435

Time-course gene sets are collections of predefined groups of genes in some patients gathered over time. The analysis of time-course gene sets for testing gene sets which vary significantly over time is an important context in genomic data analysis. In this paper, the method of generalized estimating equations (GEEs), which is a semi-parametric approach, is applied to time-course gene set data. We propose a special structure of working correlation matrix to handle the association among repeated measurements of each patient over time. Also, the proposed working correlation matrix permits estimation of the effects of the same gene among different patients. The proposed approach is applied to an HIV therapeutic vaccine trial (DALIA-1 trial). This data set has two phases: pre-ATI and post-ATI which depend on a vaccination period. Using multiple testing, the significant gene sets in the pre-ATI phase are detected and data on two randomly selected gene sets in the post-ATI phase are also analyzed. Some simulation studies are performed to illustrate the proposed approaches. The results of the simulation studies confirm the good performance of our proposed approach. 相似文献

19.

Estimating the variance of estimated trends in proportions when there is no unique subject identifier

William K. Mountford Stuart R. Lipsitz Garrett M. Fitzmaurice Rickey E. Carter Jeremy B. Soule John A. Colwell Daniel T. Lackland 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(1):185-193

Summary. Longitudinal population-based surveys are widely used in the health sciences to study patterns of change over time. In many of these data sets unique patient identifiers are not publicly available, making it impossible to link the repeated measures from the same individual directly. This poses a statistical challenge for making inferences about time trends because repeated measures from the same individual are likely to be positively correlated, i.e., although the time trend that is estimated under the naïve assumption of independence is unbiased, an unbiased estimate of the variance cannot be obtained without knowledge of the subject identifiers linking repeated measures over time. We propose a simple method for obtaining a conservative estimate of variability for making inferences about trends in proportions over time, ensuring that the type I error is no greater than the specified level. The method proposed is illustrated by using longitudinal data on diabetes hospitalization proportions in South Carolina. 相似文献

20.

Inference in Approximately Sparse Correlated Random Effects Probit Models With Panel Data

《商业与经济统计学杂志》2012,30(1):1-18

Abstract

We propose a simple procedure based on an existing “debiased” l₁-regularized method for inference of the average partial effects (APEs) in approximately sparse probit and fractional probit models with panel data, where the number of time periods is fixed and small relative to the number of cross-sectional observations. Our method is computationally simple and does not suffer from the incidental parameters problems that come from attempting to estimate as a parameter the unobserved heterogeneity for each cross-sectional unit. Furthermore, it is robust to arbitrary serial dependence in underlying idiosyncratic errors. Our theoretical results illustrate that inference concerning APEs is more challenging than inference about fixed and low-dimensional parameters, as the former concerns deriving the asymptotic normality for sample averages of linear functions of a potentially large set of components in our estimator when a series approximation for the conditional mean of the unobserved heterogeneity is considered. Insights on the applicability and implications of other existing Lasso-based inference procedures for our problem are provided. We apply the debiasing method to estimate the effects of spending on test pass rates. Our results show that spending has a positive and statistically significant average partial effect; moreover, the effect is comparable to found using standard parametric methods. 相似文献