期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The em algorithm for the quasi-likelihood regression model

Myunghee Cho Paik 《统计学通讯:理论与方法》2013,42(6):1403-1430

The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed. 相似文献

2.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献

3.

A Bayesian Approach in Estimating Transition Probabilities of a Discrete-time Markov Chain for Ignorable Intermittent Missing Data

Junsheng Ma Xiaoying Yu Elaine Symanski Rachelle Doody 《统计学通讯:模拟与计算》2016,45(7):2598-2616

This article focuses on data analyses under the scenario of missing at random within discrete-time Markov chain models. The naive method, nonlinear (NL) method, and Expectation-Maximization (EM) algorithm are discussed. We extend the NL method into a Bayesian framework, using an adjusted rejection algorithm to sample the posterior distribution, and estimating the transition probabilities with a Monte Carlo algorithm. We compare the Bayesian nonlinear (BNL) method with the naive method and the EM algorithm with various missing rates, and comprehensively evaluate estimators in terms of biases, variances, mean square errors, and coverage probabilities (CPs). Our simulation results show that the EM algorithm usually offers smallest variances but with poorest CP, while the BNL method has smaller variances and better/similar CP as compared to the naive method. When the missing rate is low (about 9%, MAR), the three methods are comparable. Whereas when the missing rate is high (about 25%, MAR), overall, the BNL method performs slightly but consistently better than the naive method regarding variances and CP. Data from a longitudinal study of stress level among caregivers of individuals with Alzheimer’s disease is used to illustrate these methods. 相似文献

4.

The one-step-late PXEM algorithm

Van Dyk David A. Tang Ruoxi 《Statistics and Computing》2003,13(2):137-152

The EM algorithm is a popular method for computing maximum likelihood estimates or posterior modes in models that can be formulated in terms of missing data or latent structure. Although easy implementation and stable convergence help to explain the popularity of the algorithm, its convergence is sometimes notoriously slow. In recent years, however, various adaptations have significantly improved the speed of EM while maintaining its stability and simplicity. One especially successful method for maximum likelihood is known as the parameter expanded EM or PXEM algorithm. Unfortunately, PXEM does not generally have a closed form M-step when computing posterior modes, even when the corresponding EM algorithm is in closed form. In this paper we confront this problem by adapting the one-step-late EM algorithm to PXEM to establish a fast closed form algorithm that improves on the one-step-late EM algorithm by insuring monotone convergence. We use this algorithm to fit a probit regression model and a variety of dynamic linear models, showing computational savings of as much as 99.9%, with the biggest savings occurring when the EM algorithm is the slowest to converge. 相似文献

5.

INCOMPLETE DATA IN GENERALIZED LINEAR MODELS WITH CONTINUOUS COVARIATES

Joseph G. Brahim Sanford Weisberg 《Australian & New Zealand Journal of Statistics》1992,34(3):461-470

This paper proposes a method for estimating the parameters in a generalized linear model with missing covariates. The missing covariates are assumed to come from a continuous distribution, and are assumed to be missing at random. In particular, Gaussian quadrature methods are used on the E-step of the EM algorithm, leading to an approximate EM algorithm. The parameters are then estimated using the weighted EM procedure given in Ibrahim (1990). This approximate EM procedure leads to approximate maximum likelihood estimates, whose standard errors and asymptotic properties are given. The proposed procedure is illustrated on a data set. 相似文献

6.

Full information maximum likelihood estimation in factor analysis with a large number of missing values

《Journal of Statistical Computation and Simulation》2012,82(1):91-104

We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes. 相似文献

7.

Estimating Transition Probabilities for Ignorable Intermittent Missing Data in a Discrete-Time Markov Chain

Hung-Wen Yeh Wenyaw Chan Elaine Symanski Barry R. Davis 《统计学通讯:模拟与计算》2013,42(2):433-448

This article considers a discrete-time Markov chain for modeling transition probabilities when multiple successive observations are missing at random between two observed outcomes using three methods: a na\"?ve analog of complete-case analysis using the observed one-step transitions alone, a non data-augmentation method (NL) by solving nonlinear equations, and a data-augmentation method, the Expectation-Maximization (EM) algorithm. The explicit form of the conditional log-likelihood given the observed information as required by the E step is provided, and the iterative formula in the M step is expressed in a closed form. An empirical study was performed to examine the accuracy and precision of the estimates obtained in the three methods under ignorable missing mechanisms of missing completely at random and missing at random. A dataset from the mental health arena was used for illustration. It was found that both data-augmentation and nonaugmentation methods provide accurate and precise point estimation, and that the na\"?ve method resulted in estimates of the transition probabilities with similar bias but larger MSE. The NL method and the EM algorithm in general provide similar results whereas the latter provides conditional expected row margins leading to smaller standard errors. 相似文献

8.

Likelihood estimation of missing cell means in the fixed model analysis of variance

G.W. Fellingham H.D. Tolley D.T. Scott 《统计学通讯:理论与方法》2013,42(9):2429-2447

This paper examines the formation of maximum likelihood estimates of cell means in analysis of variance problems for cells with missing observations. Methods of estimating the means for missing cells has a long history which includes iterative maximum likelihood techniques, approximation techniques and ad hoc techniques. The use of the EM algorithm to form maximum likelihood estimates has resolved most of the issues associated with this problem. Implementation of the EM algorithm entails specification of a reduced model. As demonstrated in this paper, when there are several missing cells, it is possible to specify a reduced model that results in an unidentifiable likelihood. The EM algorithm in this case does not converge, although the slow divergence may often be mistaken by the unwary as convergence. This paper presents a simple matrix method of determining whether or not the reduced model results in an identifiable likelihood, and consequently in an EM algorithm that converges. We also show the EM algorithm in this case to be equivalent to a method which yields a closed form solution. 相似文献

9.

Analysis of ordinal outcomes with longitudinal covariates subject to missingness

Melody S. Goodman Yi Li Anne M. Stoddard Glorian Sorensen 《Journal of applied statistics》2014,41(5):1040-1052

We propose a mixture model for data with an ordinal outcome and a longitudinal covariate that is subject to missingness. Data from a tailored telephone delivered, smoking cessation intervention for construction laborers are used to illustrate the method, which considers as an outcome a categorical measure of smoking cessation, and evaluates the effectiveness of the motivational telephone interviews on this outcome. We propose two model structures for the longitudinal covariate, for the case when the missing data are missing at random, and when the missing data mechanism is non-ignorable. A generalized EM algorithm is used to obtain maximum likelihood estimates. 相似文献

10.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

11.

Estimation in capture–recapture models when covariates are subject to measurement errors and missing data

Liqun Xi Ray Watson Ji‐Ping Wang Paul S. F. Yip 《Revue canadienne de statistique》2009,37(4):645-658

For capture–recapture models when covariates are subject to measurement errors and missing data, a set of estimating equations is constructed to estimate population size and relevant parameters. These estimating equations can be solved by an algorithm similar to the EM algorithm. The proposed method is also applicable to the situation when covariates with no measurement errors have missing data. Simulation studies are used to assess the performance of the proposed estimator. The estimator is also applied to a capture–recapture experiment on the bird species Prinia flaviventris in Hong Kong. The Canadian Journal of Statistics 37: 645–658; 2009 © 2009 Statistical Society of Canada 相似文献

12.

Evaluation of multiple-imputation procedures for three-mode component models

Joost R. van Ginkel Pieter M. Kroonenberg 《Journal of Statistical Computation and Simulation》2017,87(16):3059-3081

Three-mode analysis is a generalization of principal component analysis to three-mode data. While two-mode data consist of cases that are measured on several variables, three-mode data consist of cases that are measured on several variables at several occasions. As any other statistical technique, the results of three-mode analysis may be influenced by missing data. Three-mode software packages generally use the expectation–maximization (EM) algorithm for dealing with missing data. However, there are situations in which the EM algorithm is expected to break down. Alternatively, multiple imputation may be used for dealing with missing data. In this study we investigated the influence of eight different multiple-imputation methods on the results of three-mode analysis, more specifically, a Tucker2 analysis, and compared the results with those of the EM algorithm. Results of the simulations show that multilevel imputation with the mode with the most levels nested within cases and the mode with the least levels represented as variables gives the best results for a Tucker2 analysis. Thus, this may be a good alternative for the EM algorithm in handling missing data in a Tucker2 analysis. 相似文献

13.

A new REML (parameter expanded) EM algorithm for linear mixed models

下载免费PDF全文

S. M. Diffey A. B. Smith A. H. Welsh B. R. Cullis 《Australian & New Zealand Journal of Statistics》2017,59(4):433-448

Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework. 相似文献

14.

On the estimation of the extreme value and normal distribution parameters based on progressive type-II hybrid-censored data

《Journal of Statistical Computation and Simulation》2012,82(3):569-596

A progressive hybrid censoring scheme is a mixture of type-I and type-II progressive censoring schemes. In this paper, we mainly consider the analysis of progressive type-II hybrid-censored data when the lifetime distribution of the individual item is the normal and extreme value distributions. Since the maximum likelihood estimators (MLEs) of these parameters cannot be obtained in the closed form, we propose to use the expectation and maximization (EM) algorithm to compute the MLEs. Also, the Newton–Raphson method is used to estimate the model parameters. The asymptotic variance–covariance matrix of the MLEs under EM framework is obtained by Fisher information matrix using the missing information and asymptotic confidence intervals for the parameters are then constructed. This study will end up with comparing the two methods of estimation and the asymptotic confidence intervals of coverage probabilities corresponding to the missing information principle and the observed information matrix through a simulation study, illustrated examples and real data analysis. 相似文献

15.

Missing values: sparse inverse covariance estimation and?an?extension to sparse regression

Nicolas St?dler Peter Bühlmann 《Statistics and Computing》2012,22(1):219-235

We propose an ℓ ₁-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data. 相似文献

16.

Estimation in Regressive Logistic Regression Analyses of Familial Data with Missing Outcomes

Patrick E.B. FitzGerald & Matthew W. Knuiman 《Australian & New Zealand Journal of Statistics》1998,40(3):305-316

This paper examines a number of methods of handling missing outcomes in regressive logistic regression modelling of familial binary data, and compares them with an EM algorithm approach via a simulation study. The results indicate that a strategy based on imputation of missing values leads to biased estimates, and that a strategy of excluding incomplete families has a substantial effect on the variability of the parameter estimates. Recommendations are made which depend, amongst other factors, on the amount of missing data and on the availability of software. 相似文献

17.

Robust methods for generalized linear models with nonignorable missing covariates

Sanjoy K. Sinha 《Revue canadienne de statistique》2008,36(2):277-299

The EM algorithm is often used for finding the maximum likelihood estimates in generalized linear models with incomplete data. In this article, the author presents a robust method in the framework of the maximum likelihood estimation for fitting generalized linear models when nonignorable covariates are missing. His robust approach is useful for downweighting any influential observations when estimating the model parameters. To avoid computational problems involving irreducibly high‐dimensional integrals, he adopts a Metropolis‐Hastings algorithm based on a Markov chain sampling method. He carries out simulations to investigate the behaviour of the robust estimates in the presence of outliers and missing covariates; furthermore, he compares these estimates to the classical maximum likelihood estimates. Finally, he illustrates his approach using data on the occurrence of delirium in patients operated on for abdominal aortic aneurysm. 相似文献

18.

Gaussian Scale Mixture Models for Robust Linear Multivariate Regression with Missing Data

Juha Ala-Luhtala Robert Piché 《统计学通讯:模拟与计算》2016,45(3):791-813

We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature. 相似文献

19.

Cox Regression with Incomplete Covariate Measurements using the EM-algorithm 总被引：1，自引：0，他引：1

Torben Martinussen 《Scandinavian Journal of Statistics》1999,26(4):479-491

Ibrahim (1990) used the EM-algorithm to obtain maximum likelihood estimates of the regression parameters in generalized linear models with partially missing covariates. The technique was termed EM by the method of weights. In this paper, we generalize this technique to Cox regression analysis with missing values in the covariates. We specify a full model letting the unobserved covariate values be random and then maximize the observed likelihood. The asymptotic covariance matrix is estimated by the inverse information matrix. The missing data are allowed to be missing at random but also the non-ignorable non-response situation may in principle be considered. Simulation studies indicate that the proposed method is more efficient than the method suggested by Paik & Tsai (1997). We apply the procedure to a clinical trials example with six covariates with three of them having missing values. 相似文献

20.

Estimation on Lomax progressive censoring using the EM algorithm

《Journal of Statistical Computation and Simulation》2012,82(5):1035-1052

Based on progressively type-II censored data, the maximum-likelihood estimators (MLEs) for the Lomax parameters are derived using the expectation–maximization (EM) algorithm. Moreover, the expected Fisher information matrix based on the missing value principle is computed. Using extensive simulation and three criteria, namely, bias, root mean squared error and Pitman closeness measures, we compare the performance of the MLEs via the EM algorithm and the Newton–Raphson (NR) method. It is concluded that the EM algorithm outperforms the NR method in all the cases. Two real data examples are used to illustrate our proposed estimators. 相似文献