期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parameter estimation of nonlinear mixed-effects models using first-order conditional linearization and the EM algorithm

Liyong Fu Yuancai Lei Ram P. Sharma 《Journal of applied statistics》2013,40(2):252-265

Nonlinear mixed-effects (NLME) models are flexible enough to handle repeated-measures data from various disciplines. In this article, we propose both maximum-likelihood and restricted maximum-likelihood estimations of NLME models using first-order conditional expansion (FOCE) and the expectation–maximization (EM) algorithm. The FOCE-EM algorithm implemented in the ForStat procedure SNLME is compared with the Lindstrom and Bates (LB) algorithm implemented in both the SAS macro NLINMIX and the S-Plus/R function nlme in terms of computational efficiency and statistical properties. Two realworld data sets an orange tree data set and a Chinese fir (Cunninghamia lanceolata) data set, and a simulated data set were used for evaluation. FOCE-EM converged for all mixed models derived from the base model in the two realworld cases, while LB did not, especially for the models in which random effects are simultaneously considered in several parameters to account for between-subject variation. However, both algorithms had identical estimated parameters and fit statistics for the converged models. We therefore recommend using FOCE-EM in NLME models, particularly when convergence is a concern in model selection. 相似文献

2.

Fast EM-type implementations for mixed effects models

X.-L. Meng & D. van Dyk 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(3):559-578

The mixed effects model, in its various forms, is a common model in applied statistics. A useful strategy for fitting this model implements EM-type algorithms by treating the random effects as missing data. Such implementations, however, can be painfully slow when the variances of the random effects are small relative to the residual variance. In this paper, we apply the 'working parameter' approach to derive alternative EM-type implementations for fitting mixed effects models, which we show empirically can be hundreds of times faster than the common EM-type implementations. In our limited simulations, they also compare well with the routines in S-PLUS® and Stata® in terms of both speed and reliability. The central idea of the working parameter approach is to search for efficient data augmentation schemes for implementing the EM algorithm by minimizing the augmented information over the working parameter, and in the mixed effects setting this leads to a transfer of the mixed effects variances into the regression slope parameters. We also describe a variation for computing the restricted maximum likelihood estimate and an adaptive algorithm that takes advantage of both the standard and the alternative EM-type implementations. 相似文献

3.

The em algorithm for the quasi-likelihood regression model

Myunghee Cho Paik 《统计学通讯:理论与方法》2013,42(6):1403-1430

The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed. 相似文献

4.

A stochastic approximation algorithm for maximum-likelihood estimation with incomplete data

Ming Gao Gu Shaolin Li 《Revue canadienne de statistique》1998,26(4):567-582

We propose a new stochastic approximation (SA) algorithm for maximum-likelihood estimation (MLE) in the incomplete-data setting. This algorithm is most useful for problems when the EM algorithm is not possible due to an intractable E-step or M-step. Compared to other algorithm that have been proposed for intractable EM problems, such as the MCEM algorithm of Wei and Tanner (1990), our proposed algorithm appears more generally applicable and efficient. The approach we adopt is inspired by the Robbins-Monro (1951) stochastic approximation procedure, and we show that the proposed algorithm can be used to solve some of the long-standing problems in computing an MLE with incomplete data. We prove that in general O(n) simulation steps are required in computing the MLE with the SA algorithm and O(n log n) simulation steps are required in computing the MLE using the MCEM and/or the MCNR algorithm, where n is the sample size of the observations. Examples include computing the MLE in the nonlinear error-in-variable model and nonlinear regression model with random effects. 相似文献

5.

On-line expectation–maximization algorithm for latent data models

Olivier Cappé Eric Moulines 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(3):593-613

Summary. We propose a generic on-line (also sometimes called adaptive or recursive) version of the expectation–maximization (EM) algorithm applicable to latent variable models of independent observations. Compared with the algorithm of Titterington, this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete-data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback–Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e. that of the maximum likelihood estimator. In addition, the approach proposed is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model. 相似文献

6.

On the estimation of the extreme value and normal distribution parameters based on progressive type-II hybrid-censored data

《Journal of Statistical Computation and Simulation》2012,82(3):569-596

A progressive hybrid censoring scheme is a mixture of type-I and type-II progressive censoring schemes. In this paper, we mainly consider the analysis of progressive type-II hybrid-censored data when the lifetime distribution of the individual item is the normal and extreme value distributions. Since the maximum likelihood estimators (MLEs) of these parameters cannot be obtained in the closed form, we propose to use the expectation and maximization (EM) algorithm to compute the MLEs. Also, the Newton–Raphson method is used to estimate the model parameters. The asymptotic variance–covariance matrix of the MLEs under EM framework is obtained by Fisher information matrix using the missing information and asymptotic confidence intervals for the parameters are then constructed. This study will end up with comparing the two methods of estimation and the asymptotic confidence intervals of coverage probabilities corresponding to the missing information principle and the observed information matrix through a simulation study, illustrated examples and real data analysis. 相似文献

7.

非参数异方差模型中条件回归函数的EM算法——基于农村食品消费与纯收入的实证研究

王继霞申培萍《统计与信息论坛》2014,(1):9-12

对非参数异方差模型中回归函数的EM算法进行研究,并基于EM算法得到了条件回归函数的估计。此外,通过对农村居民食品消费支出与纯收入关系的实证分析,说明了基于EM算法的估计方法比最小二乘估计方法的拟合效果更好,并对恩格尔系数进行了拟合,分析了其变化走势。相似文献

8.

Inference and optimal censoring schemes for progressively censored Birnbaum–Saunders distribution

Biswabrata Pradhan Debasis Kundu 《Journal of statistical planning and inference》2013

The aim of this paper is twofold. First we discuss the maximum likelihood estimators of the unknown parameters of a two-parameter Birnbaum–Saunders distribution when the data are progressively Type-II censored. The maximum likelihood estimators are obtained using the EM algorithm by exploiting the property that the Birnbaum–Saunders distribution can be expressed as an equal mixture of an inverse Gaussian distribution and its reciprocal. From the proposed EM algorithm, the observed information matrix can be obtained quite easily, which can be used to construct the asymptotic confidence intervals. We perform the analysis of two real and one simulated data sets for illustrative purposes, and the performances are quite satisfactory. We further propose the use of different criteria to compare two different sampling schemes, and then find the optimal sampling scheme for a given criterion. It is observed that finding the optimal censoring scheme is a discrete optimization problem, and it is quite a computer intensive process. We examine one sub-optimal censoring scheme by restricting the choice of censoring schemes to one-step censoring schemes as suggested by Balakrishnan (2007), which can be obtained quite easily. We compare the performances of the sub-optimal censoring schemes with the optimal ones, and observe that the loss of information is quite insignificant. 相似文献

9.

Simple Fitting Algorithms for Incomplete Categorical Data

Geert Molenberghs & Els Goetghebeur 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(2):401-414

A popular approach to estimation based on incomplete data is the EM algorithm. For categorical data, this paper presents a simple expression of the observed data log-likelihood and its derivatives in terms of the complete data for a broad class of models and missing data patterns. We show that using the observed data likelihood directly is easy and has some advantages. One can gain considerable computational speed over the EM algorithm and a straightforward variance estimator is obtained for the parameter estimates. The general formulation treats a wide range of missing data problems in a uniform way. Two examples are worked out in full. 相似文献

10.

The one-step-late PXEM algorithm

Van Dyk David A. Tang Ruoxi 《Statistics and Computing》2003,13(2):137-152

The EM algorithm is a popular method for computing maximum likelihood estimates or posterior modes in models that can be formulated in terms of missing data or latent structure. Although easy implementation and stable convergence help to explain the popularity of the algorithm, its convergence is sometimes notoriously slow. In recent years, however, various adaptations have significantly improved the speed of EM while maintaining its stability and simplicity. One especially successful method for maximum likelihood is known as the parameter expanded EM or PXEM algorithm. Unfortunately, PXEM does not generally have a closed form M-step when computing posterior modes, even when the corresponding EM algorithm is in closed form. In this paper we confront this problem by adapting the one-step-late EM algorithm to PXEM to establish a fast closed form algorithm that improves on the one-step-late EM algorithm by insuring monotone convergence. We use this algorithm to fit a probit regression model and a variety of dynamic linear models, showing computational savings of as much as 99.9%, with the biggest savings occurring when the EM algorithm is the slowest to converge. 相似文献

11.

Trajectory Modeling of Longitudinal Binary Data: Application of the EM Algorithm for Mixture Models

Man-Kee M. Chu John J. Koval 《统计学通讯:模拟与计算》2013,42(3):495-519

A developmental trajectory describes the course of behavior over time. Identifying multiple trajectories within an overall developmental process permits a focus on subgroups of particular interest. We introduce a framework for identifying trajectories by using the Expectation-Maximization (EM) algorithm to fit semiparametric mixtures of logistic distributions to longitudinal binary data. For performance comparison, we consider full maximization algorithms (PROC TRAJ in SAS), standard EM, and two other EM-based algorithms for speeding up convergence. Simulation shows that EM methods produce more accurate parameter estimates. The EM methodology is illustrated with a longitudinal dataset involving adolescents smoking behaviors. 相似文献

12.

EM algorithms for beta kernel distributions

《Journal of Statistical Computation and Simulation》2012,82(2):451-467

The EM algorithm is employed to compute maximum-likelihood estimates for beta kernel distributions. Estimation is considered under two censoring schemes: the progressive Type-I censoring and progressive Type-II right censoring schemes. As an application, the EM algorithm is executed to obtain maximum-likelihood estimates for the beta Weibull distribution under the two censoring schemes. A simulation study and two real data sets are used to show the efficiency of the EM algorithm. 相似文献

13.

Identification of target clusters by using the restricted normal mixture model

Seung-Gu Kim Yung-Seop Lee 《Journal of applied statistics》2013,40(5):941-960

This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification. 相似文献

14.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

15.

Inference methods for saturated models in longitudinal clinical trials with incomplete binary data

Song JX 《Pharmaceutical statistics》2006,5(4):295-304

In the longitudinal studies with binary response, it is often of interest to estimate the percentage of positive responses at each time point and the percentage of having at least one positive response by each time point. When missing data exist, the conventional method based on observed percentages could result in erroneous estimates. This study demonstrates two methods of using expectation-maximization (EM) and data augmentation (DA) algorithms in the estimation of the marginal and cumulative probabilities for incomplete longitudinal binary response data. Both methods provide unbiased estimates when the missingness mechanism is missing at random (MAR) assumption. Sensitivity analyses have been performed for cases when the MAR assumption is in question. 相似文献

16.

INCOMPLETE DATA IN GENERALIZED LINEAR MODELS WITH CONTINUOUS COVARIATES

Joseph G. Brahim Sanford Weisberg 《Australian & New Zealand Journal of Statistics》1992,34(3):461-470

This paper proposes a method for estimating the parameters in a generalized linear model with missing covariates. The missing covariates are assumed to come from a continuous distribution, and are assumed to be missing at random. In particular, Gaussian quadrature methods are used on the E-step of the EM algorithm, leading to an approximate EM algorithm. The parameters are then estimated using the weighted EM procedure given in Ibrahim (1990). This approximate EM procedure leads to approximate maximum likelihood estimates, whose standard errors and asymptotic properties are given. The proposed procedure is illustrated on a data set. 相似文献

17.

On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures

S. K. Ng G. J. McLachlan 《Statistics and Computing》2003,13(1):45-55

The EM algorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. Two versions, the incremental EM (IEM) algorithm and a sparse version of the EM algorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355–368 to reduce the computational cost of applying the EM algorithm. With the IEM algorithm, the available n observations are divided into B (B n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the EM algorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated; the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the IEM algorithm with various number of blocks and the standard EM algorithm. In particular, we propose a simple rule for choosing the number of blocks with the IEM algorithm. For the IEM algorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the IEM algorithm (SPIEM) is formulated by combining the sparse E-step of the EM algorithm and the partial E-step of the IEM algorithm. This SPIEM algorithm can further reduce the computation time of the IEM algorithm. 相似文献

18.

Addressing misclassification for binary data: probit and t-link regressions

《Journal of Statistical Computation and Simulation》2012,82(10):2187-2213

Generalized linear models are addressed to describe the dependence of data on explanatory variables when the binary outcome is subject to misclassification. Both probit and t-link regressions for misclassified binary data under Bayesian methodology are proposed. The computational difficulties have been avoided by using data augmentation. The idea of using a data augmentation framework (with two types of latent variables) is exploited to derive efficient Gibbs sampling and expectation–maximization algorithms. Besides, this formulation has allowed to obtain the probit model as a particular case of the t-link model. Simulation examples are presented to illustrate the model performance when comparing with standard methods that do not consider misclassification. In order to show the potential of the proposed approaches, a real data problem arising when studying hearing loss caused by exposure to occupational noise is analysed. 相似文献

19.

A Bayesian Approach in Estimating Transition Probabilities of a Discrete-time Markov Chain for Ignorable Intermittent Missing Data

Junsheng Ma Xiaoying Yu Elaine Symanski Rachelle Doody 《统计学通讯:模拟与计算》2016,45(7):2598-2616

This article focuses on data analyses under the scenario of missing at random within discrete-time Markov chain models. The naive method, nonlinear (NL) method, and Expectation-Maximization (EM) algorithm are discussed. We extend the NL method into a Bayesian framework, using an adjusted rejection algorithm to sample the posterior distribution, and estimating the transition probabilities with a Monte Carlo algorithm. We compare the Bayesian nonlinear (BNL) method with the naive method and the EM algorithm with various missing rates, and comprehensively evaluate estimators in terms of biases, variances, mean square errors, and coverage probabilities (CPs). Our simulation results show that the EM algorithm usually offers smallest variances but with poorest CP, while the BNL method has smaller variances and better/similar CP as compared to the naive method. When the missing rate is low (about 9%, MAR), the three methods are comparable. Whereas when the missing rate is high (about 25%, MAR), overall, the BNL method performs slightly but consistently better than the naive method regarding variances and CP. Data from a longitudinal study of stress level among caregivers of individuals with Alzheimer’s disease is used to illustrate these methods. 相似文献

20.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献