期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Segmental modeling of changing immunologic response for CD4 data with skewness,missingness and dropout

Yangxin Huang Getachew A. Dagne Jeong-Gun Park 《Journal of applied statistics》2013,40(10):2244-2258

In clinical practice, the profile of each subject's CD4 response from a longitudinal study may follow a ‘broken stick’ like trajectory, indicating multiple phases of increase and/or decline in response. Such multiple phases (changepoints) may be important indicators to help quantify treatment effect and improve management of patient care. Although it is a common practice to analyze complex AIDS longitudinal data using nonlinear mixed-effects (NLME) or nonparametric mixed-effects (NPME) models in the literature, NLME or NPME models become a challenge to estimate changepoint due to complicated structures of model formulations. In this paper, we propose a changepoint mixed-effects model with random subject-specific parameters, including the changepoint for the analysis of longitudinal CD4 cell counts for HIV infected subjects following highly active antiretroviral treatment. The longitudinal CD4 data in this study may exhibit departures from symmetry, may encounter missing observations due to various reasons, which are likely to be non-ignorable in the sense that missingness may be related to the missing values, and may be censored at the time of the subject going off study-treatment, which is a potentially informative dropout mechanism. Inferential procedures can be complicated dramatically when longitudinal CD4 data with asymmetry (skewness), incompleteness and informative dropout are observed in conjunction with an unknown changepoint. Our objective is to address the simultaneous impact of skewness, missingness and informative censoring by jointly modeling the CD4 response and dropout time processes under a Bayesian framework. The method is illustrated using a real AIDS data set to compare potential models with various scenarios, and some interested results are presented. 相似文献

2.

Quantile regression-based Bayesian semiparametric mixed-effects models for longitudinal data with non-normal,missing and mismeasured covariate

《Journal of Statistical Computation and Simulation》2012,82(6):1183-1202

Quantile regression (QR) models have received increasing attention recently for longitudinal data analysis. When continuous responses appear non-centrality due to outliers and/or heavy-tails, commonly used mean regression models may fail to produce efficient estimators, whereas QR models may perform satisfactorily. In addition, longitudinal outcomes are often measured with non-normality, substantial errors and non-ignorable missing values. When carrying out statistical inference in such data setting, it is important to account for the simultaneous treatment of these data features; otherwise, erroneous or even misleading results may be produced. In the literature, there has been considerable interest in accommodating either one or some of these data features. However, there is relatively little work concerning all of them simultaneously. There is a need to fill up this gap as longitudinal data do often have these characteristics. Inferential procedure can be complicated dramatically when these data features arise in longitudinal response and covariate outcomes. In this article, our objective is to develop QR-based Bayesian semiparametric mixed-effects models to address the simultaneous impact of these multiple data features. The proposed models and method are applied to analyse a longitudinal data set arising from an AIDS clinical study. Simulation studies are conducted to assess the performance of the proposed method under various scenarios. 相似文献

3.

A comparison of two boxplot methods for detecting univariate outliers which adjust for sample size and asymmetry

Nancy J. Carter Neil C. Schwertman Terry L. Kiser 《Statistical Methodology》2009,6(6):604-621

It is important to identify outliers since inclusion, especially when using parametric methods, can cause distortion in the analysis and lead to erroneous conclusions. One of the easiest and most useful methods is based on the boxplot. This method is particularly appealing since it does not use any outliers in computing spread. Two methods, one by Carling and another by Schwertman and de Silva, adjust the boxplot method for sample size and skewness. In this paper, the two procedures are compared both theoretically and by Monte Carlo simulations. Simulations using both a symmetric distribution and an asymmetric distribution were performed on data sets with none, one, and several outliers. Based on the simulations, the Carling approach is superior in avoiding masking outliers, that is, the Carling method is less likely to overlook an outlier while the Schwertman and de Silva procedure is much better at reducing swamping, that is, misclassifying an observation as an outlier. Carling’s method is to the Schwertman and de Silva procedure as comparisonwise versus experimentwise error rate is for multiple comparisons. The two methods, rather than being competitors, appear to complement each other. Used in tandem they provide the data analyst a more complete prospective for identifying possible outliers. 相似文献

4.

Comparison of Statistical Methods for Pretest–Posttest Designs in Terms of Type I Error Probability and Statistical Power

Xionghua Wilson Wu Dejian Lai 《统计学通讯:模拟与计算》2015,44(2):284-294

The pretest–posttest design is widely used to investigate the effect of an experimental treatment in biomedical research. The treatment effect may be assessed using analysis of variance (ANOVA) or analysis of covariance (ANCOVA). The normality assumption for parametric ANOVA and ANCOVA may be violated due to outliers and skewness of data. Nonparametric methods, robust statistics, and data transformation may be used to address the nonnormality issue. However, there is no simultaneous comparison for the four statistical approaches in terms of empirical type I error probability and statistical power. We studied 13 ANOVA and ANCOVA models based on parametric approach, rank and normal score-based nonparametric approach, Huber M-estimation, and Box–Cox transformation using normal data with and without outliers and lognormal data. We found that ANCOVA models preserve the nominal significance level better and are more powerful than their ANOVA counterparts when the dependent variable and covariate are correlated. Huber M-estimation is the most liberal method. Nonparametric ANCOVA, especially ANCOVA based on normal score transformation, preserves the nominal significance level, has good statistical power, and is robust for data distribution. 相似文献

5.

Extending the Mann–Whitney–Wilcoxon rank sum test to longitudinal regression analysis

R. Chen T. Chen N. Lu H. Zhang P. Wu C. Feng 《Journal of applied statistics》2014,41(12):2658-2675

Outliers are commonly observed in psychosocial research, generally resulting in biased estimates when comparing group differences using popular mean-based models such as the analysis of variance model. Rank-based methods such as the popular Mann–Whitney–Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies under missing data. In this paper, we propose a generalized MWW test for comparing multiple groups with covariates within a longitudinal data setting, by utilizing the functional response models. Inference is based on a class of U-statistics-based weighted generalized estimating equations, providing consistent and asymptotically normal estimates not only under complete but missing data as well. The proposed approach is illustrated with both real and simulated study data. 相似文献

6.

Approximate bounded influence estimation for longitudinal data with outliers and measurement errors

Lang Wu Jin Qiu 《Journal of statistical planning and inference》2011,141(7):2321-2330

Mixed effects models or random effects models are popular for the analysis of longitudinal data. In practice, longitudinal data are often complex since there may be outliers in both the response and the covariates and there may be measurement errors. The likelihood method is a common approach for these problems but it can be computationally very intensive and sometimes may even be computationally infeasible. In this article, we consider approximate robust methods for nonlinear mixed effects models to simultaneously address outliers and measurement errors. The approximate methods are computationally very efficient. We show the consistency and asymptotic normality of the approximate estimates. The methods can also be extended to missing data problems. An example is used to illustrate the methods and a simulation is conducted to evaluate the methods. 相似文献

7.

Bayesian analysis of multivariate t linear mixed models with missing responses at random

《Journal of Statistical Computation and Simulation》2012,82(17):3594-3612

The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data. 相似文献

8.

Jointly modeling skew longitudinal survival data with missingness and mismeasured covariates

Tao Lu 《Journal of applied statistics》2017,44(13):2354-2367

Jointly modeling longitudinal and survival data has been an active research area. Most researches focus on improving the estimating efficiency but ignore many data features frequently encountered in practice. In the current study, we develop the joint models that concurrently accounting for longitudinal and survival data with multiple features. Specifically, the proposed model handles skewness, missingness and measurement errors in covariates which are typically observed in the collection of longitudinal survival data from many studies. We employ a Bayesian inferential method to make inference on the proposed model. We applied the proposed model to an real data study. A few alternative models under different conditions are compared. We conduct extensive simulations in order to evaluate how the method works. 相似文献

9.

Multivariate generalized linear mixed models with random intercepts to analyze cardiovascular risk markers in type-1 diabetic patients

Miran A. Jaffa Mulugeta Gebregziabher Deirdre K. Luttrell Louis M. Luttrell Ayad A. Jaffa 《Journal of applied statistics》2016,43(8):1447-1464

Statistical approaches tailored to analyzing longitudinal data that have multiple outcomes with different distributions are scarce. This paucity is due to the non-availability of multivariate distributions that jointly model outcomes with different distributions other than the multivariate normal. A plethora of research has been done on the specific combination of binary-Gaussian bivariate outcomes but a more general approach that allows other mixtures of distributions for multiple longitudinal outcomes has not been thoroughly demonstrated and examined. Here, we study a multivariate generalized linear mixed models approach that jointly models multiple longitudinal outcomes with different combinations of distributions and incorporates the correlations between the various outcomes through separate yet correlated random intercepts. Every outcome is linked to the set of covariates through a proper link function that allows the incorporation and joint modeling of different distributions. A novel application was demonstrated on a cohort study of Type-1 diabetic patients to jointly model a mix of longitudinal cardiovascular outcomes and to explore for the first time the effect of glycemic control treatment, plasma prekallikrein biomarker, gender and age on cardiovascular risk factors collectively. 相似文献

10.

A family of models for uniform and serial dependence in repeated measurements studies

J. K. Lindsey 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(3):343-357

Data arising from a randomized double-masked clinical trial for multiple sclerosis have provided particularly variable longitudinal repeated measurements responses. Specific models for such data, other than those based on the multivariate normal distribution, would be a valuable addition to the applied statistician's toolbox. A useful family of multivariate distributions can be generated by substituting the integrated intensity of one distribution into a second (outer) distribution. The parameters in the second distribution are then used to create a dependence structure among observations on a unit. These may either be a form of serial dependence for longitudinal data or of uniform dependence within clusters. These are respectively analogous to the Kalman filter of state space models and to copulas, but they have the major advantage that they do not require any explicit integration. One useful outer distribution for constructing such multivariate distributions is the Pareto distribution. Certain special models based on it have previously been used in event history analysis, but those considered here have much wider application. 相似文献

11.

Logistic与分类树模型变量筛选的比较——基于信用卡邮寄业务响应率分析

谢远涛杨娟王稳《统计与信息论坛》2011,26(6):96-101

基于信用卡邮寄业务响应率分析来讨论Logistic模型和分类树模型在变量选取上的区别,并尝试从几个不同角度去解释两类模型变量筛选差异的原因。笔者认为没有绝对占优势的方法,需要结合具体场景和模型的特点来选择合适的模型。分类树模型在训练集上容易过度拟合,对单个变量的影响很敏感,在进行危险因素分析时结果更能强调危险因素,对孤立点的识别率很高。Logistic模型容易受到解释变量依存关系的影响,加上分类变量的影响容易过多地选入变量或者因子,对孤立点敏感,对噪点不敏感。判别函数的差异是变量筛选差异的关键因素。相似文献

12.

Influence diagnostics and outlier tests for semiparametric mixed models

Wing-Kam Fung Zhong-Yi Zhu Bo-Cheng Wei Xuming He 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(3):565-579

Summary. Semiparametric mixed models are useful in biometric and econometric applications, especially for longitudinal data. Maximum penalized likelihood estimators (MPLEs) have been shown to work well by Zhang and co-workers for both linear coefficients and nonparametric functions. This paper considers the role of influence diagnostics in the MPLE by extending the case deletion and subject deletion analysis of linear models to accommodate the inclusion of a nonparametric component. We focus on influence measures for the fixed effects and provide formulae that are analogous to those for simpler models and readily computable with the MPLE algorithm. We also establish an equivalence between the case or subject deletion model and a mean shift outlier model from which we derive tests for outliers. The influence diagnostics proposed are illustrated through a longitudinal hormone study on progesterone and a simulated example. 相似文献

13.

Data-driven desirability function to measure patients’ disease progression in a longitudinal study

Hsiu-Wen Chen Weng Kee Wong Hongquan Xu 《Journal of applied statistics》2016,43(5):783-795

Multiple outcomes are increasingly used to assess chronic disease progression. We discuss and show how desirability functions can be used to assess a patient overall response to a treatment using multiple outcome measures and each of them may contribute unequally to the final assessment. Because judgments on disease progression and the relative contribution of each outcome can be subjective, we propose a data-driven approach to minimize the biases by using desirability functions with estimated shapes and weights based on a given gold standard. Our method provides each patient with a meaningful overall progression score that facilitates comparison and clinical interpretation. We also extend the methodology in a novel way to monitor patients’ disease progression when there are multiple time points and illustrate our method using a longitudinal data set from a randomized two-arm clinical trial for scleroderma patients. 相似文献

14.

Testing of homogeneity of variance and autocorrelation coefficients of nonlinear mixed models with AR(1) errors based on M-estimation

Huihui Sun 《Journal of applied statistics》2017,44(2):362-375

Homogeneity of between-individual variance and autocorrelation coefficients is one of assumptions in the study of longitudinal data. However, the assumption could be challenging due to the complexity of the dataset. In the paper we propose and analyze nonlinear mixed models with AR(1) errors for longitudinal data, intend to introduce Huber's function in the log-likelihood function and get robust estimation, which may help to reduce the influence of outliers, by Fisher scoring method. Testing of homogeneity of variance among individuals and autocorrelation coefficients on the basis of Huber's M-estimation is studied later in the paper. Simulation studies are carried to assess performance of score test we proposed. Results obtained from plasma concentrations data are reported as an illustrative example. 相似文献

15.

Estimation of flood frequencies from data sets with outliers using mixed distribution functions

Milan Stojković Stevan Prohaska Nikola Zlatanović 《Journal of applied statistics》2017,44(11):2017-2035

In this paper the estimation of high return period quantiles of the flood peak and volume in the Kolubara River basin are carried out. Estimation of flood frequencies is carried out on a data set containing high outliers which are identified by the Rosner’s test. Simultaneously, low outliers are determined by the multiple Grubbs–Beck. The next step involved the usage of the mixed distribution functions applied to a data set from three populations: floods with low outliers, normal floods and floods with high outliers. The contribution of the data set with low outliers is neglected, since it should underestimate the flood quantiles with large return periods. Consequently, the best fitted mixed distribution from the applied types (EV1, GEV, P3 and LP3) was determined by using the minimum standard error of fit. 相似文献

16.

基于半参数模型的中国GDP数据准确性评估

下载免费PDF全文

刘洪金林《统计研究》2012,29(10):99-104

本文以经济增长理论为基础,对1953-2010年中国GDP数据和劳动投入、资本投入、人力资本等因素建立了半参数回归模型。然后,文章对模型了进行了统计诊断分析,计算了相关统计诊断量,利用统计诊断量得到了模型的异常点,基于此对中国GDP数据的准确性进行了讨论：中国GDP数据的异常点主要集中两个时间段1958-1961年和1991-1994年。文章最后对基于半参数回归模型统计诊断的统计数据准确性评估方法进行了评述。相似文献

17.

Latent class models for longitudinal studies of the elderly with data missing at random

Beth A Reboussin Michael E Miller Kurt K Lohman & Thomas R Ten Have 《Journal of the Royal Statistical Society. Series C, Applied statistics》2002,51(1):69-90

The elderly population in the USA is expected to double in size by the year 2025, making longitudinal health studies of this population of increasing importance. The degree of loss to follow-up in studies of the elderly, which is often because elderly people cannot remain in the study, enter a nursing home or die, make longitudinal studies of this population problematic. We propose a latent class model for analysing multiple longitudinal binary health outcomes with multiple-cause non-response when the data are missing at random and a non-likelihood-based analysis is performed. We extend the estimating equations approach of Robins and co-workers to latent class models by reweighting the multiple binary longitudinal outcomes by the inverse probability of being observed. This results in consistent parameter estimates when the probability of non-response depends on observed outcomes and covariates (missing at random) assuming that the model for non-response is correctly specified. We extend the non-response model so that institutionalization, death and missingness due to failure to locate, refusal or incomplete data each have their own set of non-response probabilities. Robust variance estimates are derived which account for the use of a possibly misspecified covariance matrix, estimation of missing data weights and estimation of latent class measurement parameters. This approach is then applied to a study of lower body function among a subsample of the elderly participating in the 6-year Longitudinal Study of Aging. 相似文献

18.

Statistical profiling methods with hierarchical logistic regression for healthcare providers with binary outcomes

Xiaowei Yang Bin Peng Rongqi Chen Qian Zhang Dianwen Zhu Qing J. Zhang 《Journal of applied statistics》2014,41(1):46-59

Within the context of California's public report of coronary artery bypass graft (CABG) surgery outcomes, we first thoroughly review popular statistical methods for profiling healthcare providers. Extensive simulation studies are then conducted to compare profiling schemes based on hierarchical logistic regression (LR) modeling under various conditions. Both Bayesian and frequentist's methods are evaluated in classifying hospitals into ‘better’, ‘normal’ or ‘worse’ service providers. The simulation results suggest that no single method would dominate others on all accounts. Traditional schemes based on LR tend to identify too many false outliers, while those based on hierarchical modeling are relatively conservative. The issue of over shrinkage in hierarchical modeling is also investigated using the 2005–2006 California CABG data set. The article provides theoretical and empirical evidence in choosing the right methodology for provider profiling. 相似文献

19.

A note on contamination models and outliers

Järgen Wellmann Ursula Gather 《统计学通讯:理论与方法》2013,42(8):1793-1802

In order to describe or generate so-called outliers in univariate statistical data, contamination models are often used. These models assume that k out of n independent random variables are shifted or multiplicated by some constant, whereas the other observations still come i.i.d. from some common target distribution. Of course, these contaminants do not necessarily stick out as the extremes in the sample. Moreover, it is the amount and magnitude of ‘contamination” which determines the number of obvious outliers. Using the concept of Davies and Gather (1993) to formalize the outlier notion we quantify the amount of contamination needed to produce a prespecified expected number of ‘genuine’ outliers. In particular, we demonstrate that for sample of moderate size from a normal target distribution a rather large shift of the contaminants is necessary to yield a certain expected number of outliers. Such an insight is of interest when designing simulation studies where outliers shoulod occur as well as in theoretical investigations on outliers. 相似文献

20.

A joint marginalized multilevel model for longitudinal outcomes

Samuel Iddi 《Journal of applied statistics》2012,39(11):2413-2430

The shared-parameter model and its so-called hierarchical or random-effects extension are widely used joint modeling approaches for a combination of longitudinal continuous, binary, count, missing, and survival outcomes that naturally occurs in many clinical and other studies. A random effect is introduced and shared or allowed to differ between two or more repeated measures or longitudinal outcomes, thereby acting as a vehicle to capture association between the outcomes in these joint models. It is generally known that parameter estimates in a linear mixed model (LMM) for continuous repeated measures or longitudinal outcomes allow for a marginal interpretation, even though a hierarchical formulation is employed. This is not the case for the generalized linear mixed model (GLMM), that is, for non-Gaussian outcomes. The aforementioned joint models formulated for continuous and binary or two longitudinal binomial outcomes, using the LMM and GLMM, will naturally have marginal interpretation for parameters associated with the continuous outcome but a subject-specific interpretation for the fixed effects parameters relating covariates to binary outcomes. To derive marginally meaningful parameters for the binary models in a joint model, we adopt the marginal multilevel model (MMM) due to Heagerty [13] and Heagerty and Zeger [14] and formulate a joint MMM for two longitudinal responses. This enables to (1) capture association between the two responses and (2) obtain parameter estimates that have a population-averaged interpretation for both outcomes. The model is applied to two sets of data. The results are compared with those obtained from the existing approaches such as generalized estimating equations, GLMM, and the model of Heagerty [13]. Estimates were found to be very close to those from single analysis per outcome but the joint model yields higher precision and allows for quantifying the association between outcomes. Parameters were estimated using maximum likelihood. The model is easy to fit using available tools such as the SAS NLMIXED procedure. 相似文献