期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Impact of missing data on type 1 error rates in non‐inferiority trials

Bongin Yoo 《Pharmaceutical statistics》2010,9(2):87-99

In this paper, a simulation study is conducted to systematically investigate the impact of different types of missing data on six different statistical analyses: four different likelihood‐based linear mixed effects models and analysis of covariance (ANCOVA) using two different data sets, in non‐inferiority trial settings for the analysis of longitudinal continuous data. ANCOVA is valid when the missing data are completely at random. Likelihood‐based linear mixed effects model approaches are valid when the missing data are at random. Pattern‐mixture model (PMM) was developed to incorporate non‐random missing mechanism. Our simulations suggest that two linear mixed effects models using unstructured covariance matrix for within‐subject correlation with no random effects or first‐order autoregressive covariance matrix for within‐subject correlation with random coefficient effects provide well control of type 1 error (T1E) rate when the missing data are completely at random or at random. ANCOVA using last observation carried forward imputed data set is the worst method in terms of bias and T1E rate. PMM does not show much improvement on controlling T1E rate compared with other linear mixed effects models when the missing data are not at random but is markedly inferior when the missing data are at random. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

2.

Local influence for generalized linear mixed models

Hong‐Tu Zhu Sik‐Yum Lee 《Revue canadienne de statistique》2003,31(3):293-309

The authors describe a method for assessing model inadequacy in maximum likelihood estimation of a generalized linear mixed model. They treat the latent random effects in the model as missing data and develop the influence analysis on the basis of a Q‐function which is associated with the conditional expectation of the complete‐data log‐likelihood function in the EM algorithm. They propose a procedure to detect influential observations in six model perturbation schemes. They also illustrate their methodology in a hypothetical situation and in two real cases. 相似文献

3.

Bayesian semiparametric reproductive dispersion mixed models for non-normal longitudinal data: estimation and case influence analysis

Xingde Duan Wing Kam Fung Niansheng Tang 《Journal of Statistical Computation and Simulation》2017,87(10):1925-1939

Semiparametric reproductive dispersion mixed model (SPRDMM) is a natural extension of the reproductive dispersion model and the semiparametric mixed model. In this paper, we relax the normality assumption of random effects in SPRDMM and use a truncated and centred Dirichlet process prior to specify random effects, and present the Bayesian P-spline to approximate the smoothing unknown function. A hybrid algorithm combining the block Gibbs sampler and the Metropolis–Hastings algorithm is implemented to sample observations from the posterior distribution. Also, we develop Bayesian case deletion influence measure for SPRDMM based on the φ-divergence and present those computationally feasible formulas. Several simulation studies and a real example are presented to illustrate the proposed methodologies. 相似文献

4.

Bayesian Analysis of Nonlinear Reproductive Dispersion Mixed Models for Longitudinal Data with Nonignorable Missing Covariates

Nian-Sheng Tang Hui Zhao 《统计学通讯:模拟与计算》2013,42(6):1265-1287

This article proposes a Bayesian approach, which can simultaneously obtain the Bayesian estimates of unknown parameters and random effects, to analyze nonlinear reproductive dispersion mixed models (NRDMMs) for longitudinal data with nonignorable missing covariates and responses. The logistic regression model is employed to model the missing data mechanisms for missing covariates and responses. A hybrid sampling procedure combining the Gibber sampler and the Metropolis-Hastings algorithm is presented to draw observations from the conditional distributions. Because missing data mechanism is not testable, we develop the logarithm of the pseudo-marginal likelihood, deviance information criterion, the Bayes factor, and the pseudo-Bayes factor to compare several competing missing data mechanism models in the current considered NRDMMs with nonignorable missing covaraites and responses. Three simulation studies and a real example taken from the paediatric AIDS clinical trial group ACTG are used to illustrate the proposed methodologies. Empirical results show that our proposed methods are effective in selecting missing data mechanism models. 相似文献

5.

Multivariate generalized linear mixed models with?semi-nonparametric and smooth nonparametric random effects densities

Georgios Papageorgiou John Hinde 《Statistics and Computing》2012,22(1):79-92

We extend the family of multivariate generalized linear mixed models to include random effects that are generated by smooth densities. We consider two such families of densities, the so-called semi-nonparametric (SNP) and smooth nonparametric (SMNP) densities. Maximum likelihood estimation, under either the SNP or the SMNP densities, is carried out using a Monte Carlo EM algorithm. This algorithm uses rejection sampling and automatically increases the MC sample size as it approaches convergence. In a simulation study we investigate the performance of these two densities in capturing the true underlying shape of the random effects distribution. We also examine the implications of misspecification of the random effects distribution on the estimation of the fixed effects and their standard errors. The impact of the assumed random effects density on the estimation of the random effects themselves is investigated in a simulation study and also in an application to a real data set. 相似文献

6.

Estimation in the Cox cure model with covariates missing not at random,with application to disease screening/prediction

Lisha Guo Yi Xiong X. Joan Hu 《Revue canadienne de statistique》2020,48(4):608-632

In an attempt to provide a statistical tool for disease screening and prediction, we propose a semiparametric approach to analysis of the Cox proportional hazards cure model in situations where the observations on the event time are subject to right censoring and some covariates are missing not at random. To facilitate the methodological development, we begin with semiparametric maximum likelihood estimation (SPMLE) assuming that the (conditional) distribution of the missing covariates is known. A variant of the EM algorithm is used to compute the estimator. We then adapt the SPMLE to a more practical situation where the distribution is unknown and there is a consistent estimator based on available information. We establish the consistency and weak convergence of the resulting pseudo-SPMLE, and identify a suitable variance estimator. The application of our inference procedure to disease screening and prediction is illustrated via empirical studies. The proposed approach is used to analyze the tuberculosis screening study data that motivated this research. Its finite-sample performance is examined by simulation. 相似文献

7.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献

8.

Analysis of generalized linear mixed models via a stochastic approximation algorithm with Markov chain Monte-Carlo method 总被引：3，自引：0，他引：3

Zhu Hong-Tu Lee Sik-Yum 《Statistics and Computing》2002,12(2):175-183

In recent years much effort has been devoted to maximum likelihood estimation of generalized linear mixed models. Most of the existing methods use the EM algorithm, with various techniques in handling the intractable E-step. In this paper, a new implementation of a stochastic approximation algorithm with Markov chain Monte Carlo method is investigated. The proposed algorithm is computationally straightforward and its convergence is guaranteed. A simulation and three real data sets, including the challenging salamander data, are used to illustrate the procedure and to compare it with some existing methods. The results indicate that the proposed algorithm is an attractive alternative for problems with a large number of random effects or with high dimensional intractable integrals in the likelihood function. 相似文献

9.

Estimating Transition Probabilities for Ignorable Intermittent Missing Data in a Discrete-Time Markov Chain

Hung-Wen Yeh Wenyaw Chan Elaine Symanski Barry R. Davis 《统计学通讯:模拟与计算》2013,42(2):433-448

This article considers a discrete-time Markov chain for modeling transition probabilities when multiple successive observations are missing at random between two observed outcomes using three methods: a na\"?ve analog of complete-case analysis using the observed one-step transitions alone, a non data-augmentation method (NL) by solving nonlinear equations, and a data-augmentation method, the Expectation-Maximization (EM) algorithm. The explicit form of the conditional log-likelihood given the observed information as required by the E step is provided, and the iterative formula in the M step is expressed in a closed form. An empirical study was performed to examine the accuracy and precision of the estimates obtained in the three methods under ignorable missing mechanisms of missing completely at random and missing at random. A dataset from the mental health arena was used for illustration. It was found that both data-augmentation and nonaugmentation methods provide accurate and precise point estimation, and that the na\"?ve method resulted in estimates of the transition probabilities with similar bias but larger MSE. The NL method and the EM algorithm in general provide similar results whereas the latter provides conditional expected row margins leading to smaller standard errors. 相似文献

10.

Full information maximum likelihood estimation in factor analysis with a large number of missing values

《Journal of Statistical Computation and Simulation》2012,82(1):91-104

We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes. 相似文献

11.

Comparisons of computational methods for clustered binary data

Tanujit Dey Chae Young Lim 《Journal of Statistical Computation and Simulation》2013,83(11):2030-2046

Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis. 相似文献

12.

Covariate Decomposition Methods for Longitudinal Missing‐at‐Random Data and Predictors Associated with Subject‐Specific Effects

John M. Neuhaus Charles E. McCulloch 《Australian & New Zealand Journal of Statistics》2014,56(4):331-345

Investigators often gather longitudinal data to assess changes in responses over time within subjects and to relate these changes to within‐subject changes in predictors. Missing data are common in such studies and predictors can be correlated with subject‐specific effects. Maximum likelihood methods for generalized linear mixed models provide consistent estimates when the data are ‘missing at random’ (MAR) but can produce inconsistent estimates in settings where the random effects are correlated with one of the predictors. On the other hand, conditional maximum likelihood methods (and closely related maximum likelihood methods that partition covariates into between‐ and within‐cluster components) provide consistent estimation when random effects are correlated with predictors but can produce inconsistent covariate effect estimates when data are MAR. Using theory, simulation studies, and fits to example data this paper shows that decomposition methods using complete covariate information produce consistent estimates. In some practical cases these methods, that ostensibly require complete covariate information, actually only involve the observed covariates. These results offer an easy‐to‐use approach to simultaneously protect against bias from both cluster‐level confounding and MAR missingness in assessments of change. 相似文献

13.

On familial Poisson mixed models with multi-dimensional random effects

《Journal of Statistical Computation and Simulation》2012,82(8):1043-1062

When a generalized linear mixed model with multiple (two or more) sources of random effects is considered, the inferences may vary depending on the nature of the random effects. In this paper, we consider a familial Poisson mixed model where each of the count responses of a family are influenced by two independent unobservable familial random effects with two distinct components of dispersion. A generalized quasilikelihood (GQL) approach is discussed for the estimation of the dispersion components as well as the regression effects of the model. A simulation study is conducted to examine the relative performance of the GQL approach as opposed to a simpler method of moments. Furthermore, the GQL estimation methodology is illustrated by using health care utilization data that follow a Poisson mixed model with one component of dispersion and by using simulated asthma data that follow a Poisson mixed model with two sources of random effects with two distinct components of dispersion. 相似文献

14.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献

15.

Likelihood inference for lognormal data with left truncation and right censoring with an illustration

N. Balakrishnan Debanjan Mitra 《Journal of statistical planning and inference》2011,141(11):3536-3553

The lognormal distribution is quite commonly used as a lifetime distribution. Data arising from life-testing and reliability studies are often left truncated and right censored. Here, the EM algorithm is used to estimate the parameters of the lognormal model based on left truncated and right censored data. The maximization step of the algorithm is carried out by two alternative methods, with one involving approximation using Taylor series expansion (leading to approximate maximum likelihood estimate) and the other based on the EM gradient algorithm (Lange, 1995). These two methods are compared based on Monte Carlo simulations. The Fisher scoring method for obtaining the maximum likelihood estimates shows a problem of convergence under this setup, except when the truncation percentage is small. The asymptotic variance-covariance matrix of the MLEs is derived by using the missing information principle (Louis, 1982), and then the asymptotic confidence intervals for scale and shape parameters are obtained and compared with corresponding bootstrap confidence intervals. Finally, some numerical examples are given to illustrate all the methods of inference developed here. 相似文献

16.

Estimation for an inverted exponentiated Rayleigh distribution under type II progressive censoring

Manoj Kumar Rastogi 《Journal of applied statistics》2014,41(11):2375-2405

In this paper, we consider estimation of unknown parameters of an inverted exponentiated Rayleigh distribution under type II progressive censored samples. Estimation of reliability and hazard functions is also considered. Maximum likelihood estimators are obtained using the Expectation–Maximization (EM) algorithm. Further, we obtain expected Fisher information matrix using the missing value principle. Bayes estimators are derived under squared error and linex loss functions. We have used Lindley, and Tiernery and Kadane methods to compute these estimates. In addition, Bayes estimators are computed using importance sampling scheme as well. Samples generated from this scheme are further utilized for constructing highest posterior density intervals for unknown parameters. For comparison purposes asymptotic intervals are also obtained. A numerical comparison is made between proposed estimators using simulations and observations are given. A real-life data set is analyzed for illustrative purposes. 相似文献

17.

Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models

Yanwei Zhang 《Statistics and Computing》2013,23(6):743-757

The Tweedie compound Poisson distribution is a subclass of the exponential dispersion family with a power variance function, in which the value of the power index lies in the interval (1,2). It is well known that the Tweedie compound Poisson density function is not analytically tractable, and numerical procedures that allow the density to be accurately and fast evaluated did not appear until fairly recently. Unsurprisingly, there has been little statistical literature devoted to full maximum likelihood inference for Tweedie compound Poisson mixed models. To date, the focus has been on estimation methods in the quasi-likelihood framework. Further, Tweedie compound Poisson mixed models involve an unknown variance function, which has a significant impact on hypothesis tests and predictive uncertainty measures. The estimation of the unknown variance function is thus of independent interest in many applications. However, quasi-likelihood-based methods are not well suited to this task. This paper presents several likelihood-based inferential methods for the Tweedie compound Poisson mixed model that enable estimation of the variance function from the data. These algorithms include the likelihood approximation method, in which both the integral over the random effects and the compound Poisson density function are evaluated numerically; and the latent variable approach, in which maximum likelihood estimation is carried out via the Monte Carlo EM algorithm, without the need for approximating the density function. In addition, we derive the corresponding Markov Chain Monte Carlo algorithm for a Bayesian formulation of the mixed model. We demonstrate the use of the various methods through a numerical example, and conduct an array of simulation studies to evaluate the statistical properties of the proposed estimators. 相似文献

18.

Bayesian analysis of multivariate t linear mixed models with missing responses at random

《Journal of Statistical Computation and Simulation》2012,82(17):3594-3612

The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data. 相似文献

19.

Fast EM-type implementations for mixed effects models

X.-L. Meng & D. van Dyk 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(3):559-578

The mixed effects model, in its various forms, is a common model in applied statistics. A useful strategy for fitting this model implements EM-type algorithms by treating the random effects as missing data. Such implementations, however, can be painfully slow when the variances of the random effects are small relative to the residual variance. In this paper, we apply the 'working parameter' approach to derive alternative EM-type implementations for fitting mixed effects models, which we show empirically can be hundreds of times faster than the common EM-type implementations. In our limited simulations, they also compare well with the routines in S-PLUS® and Stata® in terms of both speed and reliability. The central idea of the working parameter approach is to search for efficient data augmentation schemes for implementing the EM algorithm by minimizing the augmented information over the working parameter, and in the mixed effects setting this leads to a transfer of the mixed effects variances into the regression slope parameters. We also describe a variation for computing the restricted maximum likelihood estimate and an adaptive algorithm that takes advantage of both the standard and the alternative EM-type implementations. 相似文献

20.

LASSO-type estimators for semiparametric nonlinear mixed-effects models estimation

Ana Arribas-Gil Karine Bertin Cristian Meza Vincent Rivoirard 《Statistics and Computing》2014,24(3):443-460

Parametric nonlinear mixed effects models (NLMEs) are now widely used in biometrical studies, especially in pharmacokinetics research and HIV dynamics models, due to, among other aspects, the computational advances achieved during the last years. However, this kind of models may not be flexible enough for complex longitudinal data analysis. Semiparametric NLMEs (SNMMs) have been proposed as an extension of NLMEs. These models are a good compromise and retain nice features of both parametric and nonparametric models resulting in more flexible models than standard parametric NLMEs. However, SNMMs are complex models for which estimation still remains a challenge. Previous estimation procedures are based on a combination of log-likelihood approximation methods for parametric estimation and smoothing splines techniques for nonparametric estimation. In this work, we propose new estimation strategies in SNMMs. On the one hand, we use the Stochastic Approximation version of EM algorithm (SAEM) to obtain exact ML and REML estimates of the fixed effects and variance components. On the other hand, we propose a LASSO-type method to estimate the unknown nonlinear function. We derive oracle inequalities for this nonparametric estimator. We combine the two approaches in a general estimation procedure that we illustrate with simulations and through the analysis of a real data set of price evolution in on-line auctions. 相似文献