首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The development of models and methods for cure rate estimation has recently burgeoned into an important subfield of survival analysis. Much of the literature focuses on the standard mixture model. Recently, process-based models have been suggested. We focus on several models based on first passage times for Wiener processes. Whitmore and others have studied these models in a variety of contexts. Lee and Whitmore (Stat Sci 21(4):501–513, 2006) give a comprehensive review of a variety of first hitting time models and briefly discuss their potential as cure rate models. In this paper, we study the Wiener process with negative drift as a possible cure rate model but the resulting defective inverse Gaussian model is found to provide a poor fit in some cases. Several possible modifications are then suggested, which improve the defective inverse Gaussian. These modifications include: the inverse Gaussian cure rate mixture model; a mixture of two inverse Gaussian models; incorporation of heterogeneity in the drift parameter; and the addition of a second absorbing barrier to the Wiener process, representing an immunity threshold. This class of process-based models is a useful alternative to the standard model and provides an improved fit compared to the standard model when applied to many of the datasets that we have studied. Implementation of this class of models is facilitated using expectation-maximization (EM) algorithms and variants thereof, including the gradient EM algorithm. Parameter estimates for each of these EM algorithms are given and the proposed models are applied to both real and simulated data, where they perform well.  相似文献   

2.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

3.
The EM algorithm is a popular method for computing maximum likelihood estimates or posterior modes in models that can be formulated in terms of missing data or latent structure. Although easy implementation and stable convergence help to explain the popularity of the algorithm, its convergence is sometimes notoriously slow. In recent years, however, various adaptations have significantly improved the speed of EM while maintaining its stability and simplicity. One especially successful method for maximum likelihood is known as the parameter expanded EM or PXEM algorithm. Unfortunately, PXEM does not generally have a closed form M-step when computing posterior modes, even when the corresponding EM algorithm is in closed form. In this paper we confront this problem by adapting the one-step-late EM algorithm to PXEM to establish a fast closed form algorithm that improves on the one-step-late EM algorithm by insuring monotone convergence. We use this algorithm to fit a probit regression model and a variety of dynamic linear models, showing computational savings of as much as 99.9%, with the biggest savings occurring when the EM algorithm is the slowest to converge.  相似文献   

4.
Progressive multi-state models provide a convenient framework for characterizing chronic disease processes where the states represent the degree of damage resulting from the disease. Incomplete data often arise in studies of such processes, and standard methods of analysis can lead to biased parameter estimates when observation of data is response-dependent. This paper describes a joint analysis useful for fitting progressive multi-state models to data arising in longitudinal studies in such settings. Likelihood based methods are described and parameters are shown to be identifiable. An EM algorithm is described for parameter estimation, and variance estimation is carried out using the Louis’ method. Simulation studies demonstrate that the proposed method works well in practice under a variety of settings. An application to data from a smoking prevention study illustrates the utility of the method.  相似文献   

5.
Parametric incomplete data models defined by ordinary differential equations (ODEs) are widely used in biostatistics to describe biological processes accurately. Their parameters are estimated on approximate models, whose regression functions are evaluated by a numerical integration method. Accurate and efficient estimations of these parameters are critical issues. This paper proposes parameter estimation methods involving either a stochastic approximation EM algorithm (SAEM) in the maximum likelihood estimation, or a Gibbs sampler in the Bayesian approach. Both algorithms involve the simulation of non-observed data with conditional distributions using Hastings–Metropolis (H–M) algorithms. A modified H–M algorithm, including an original local linearization scheme to solve the ODEs, is proposed to reduce the computational time significantly. The convergence on the approximate model of all these algorithms is proved. The errors induced by the numerical solving method on the conditional distribution, the likelihood and the posterior distribution are bounded. The Bayesian and maximum likelihood estimation methods are illustrated on a simulated pharmacokinetic nonlinear mixed-effects model defined by an ODE. Simulation results illustrate the ability of these algorithms to provide accurate estimates.  相似文献   

6.
Incomplete growth curve data often result from missing or mistimed observations in a repeated measures design. Virtually all methods of analysis rely on the dispersion matrix estimates. A Monte Carlo simulation was used to compare three methods of estimation of dispersion matrices for incomplete growth curve data. The three methods were: 1) maximum likelihood estimation with a smoothing algorithm, which finds the closest positive semidefinite estimate of the pairwise estimated dispersion matrix; 2) a mixed effects model using the EM (estimation maximization) algorithm; and 3) a mixed effects model with the scoring algorithm. The simulation included 5 dispersion structures, 20 or 40 subjects with 4 or 8 observations per subject and 10 or 30% missing data. In all the simulations, the smoothing algorithm was the poorest estimator of the dispersion matrix. In most cases, there were no significant differences between the scoring and EM algorithms. The EM algorithm tended to be better than the scoring algorithm when the variances of the random effects were close to zero, especially for the simulations with 4 observations per subject and two random effects.  相似文献   

7.
Recently Sarhan and Balakrishnan [2007. A new class of bivariate distribution and its mixture. Journal of Multivariate Analysis 98, 1508–1527] introduced a new bivariate distribution using generalized exponential and exponential distributions. They discussed several interesting properties of this new distribution. Unfortunately, they did not discuss any estimation procedure of the unknown parameters. In this paper using the similar idea as of Sarhan and Balakrishnan [2007. A new class of bivariate distribution and its mixture. Journal of Multivariate Analysis 98, 1508–1527], we have proposed a singular bivariate distribution, which has an extra shape parameter. It is observed that the marginal distributions of the proposed bivariate distribution are more flexible than the corresponding marginal distributions of the Marshall–Olkin bivariate exponential distribution, Sarhan–Balakrishnan's bivariate distribution or the bivariate generalized exponential distribution. Different properties of this new distribution have been discussed. We provide the maximum likelihood estimators of the unknown parameters using EM algorithm. We reported some simulation results and performed two data analysis for illustrative purposes. Finally we propose some generalizations of this bivariate model.  相似文献   

8.
The EM algorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. Two versions, the incremental EM (IEM) algorithm and a sparse version of the EM algorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355–368 to reduce the computational cost of applying the EM algorithm. With the IEM algorithm, the available n observations are divided into B (B n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the EM algorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated; the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the IEM algorithm with various number of blocks and the standard EM algorithm. In particular, we propose a simple rule for choosing the number of blocks with the IEM algorithm. For the IEM algorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the IEM algorithm (SPIEM) is formulated by combining the sparse E-step of the EM algorithm and the partial E-step of the IEM algorithm. This SPIEM algorithm can further reduce the computation time of the IEM algorithm.  相似文献   

9.
Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis.  相似文献   

10.
Monitoring cross-sectional and serially interdependent processes has become a new issue in statistical process control (SPC). In up-to-date SPC literature, Kalman filtering was reported to monitor univariate autocorrelated processes. This paper applies a Kalman filter or state-space method for SPC to monitoring multivariate time series. We use Aoki's approach to estimate the parameter matrices of a state-space model. Multivariate Hotelling T 2 control charts are employed to monitor the residuals of the state-space. Examples of this approach are illustrated.  相似文献   

11.
Longitudinal data analysis in epidemiological settings is complicated by large multiplicities of short time series and the occurrence of missing observations. To handle such difficulties Rosner & Muñoz (1988) developed a weighted non-linear least squares algorithm for estimating parameters for first-order autoregressive (AR1) processes with time-varying covariates. This method proved efficient when compared to complete case procedures. Here that work is extended by (1) introducing a different estimation procedure based on the EM algorithm, and (2) formulating estimation techniques for second-order autoregressive models. The second development is important because some of the intended areas of application (adult pulmonary function decline, childhood blood pressure) have autocorrelation functions which decay more slowly than the geometric rate imposed by an AR1 model. Simulation studies are used to compare the three methodologies (non-linear, EM based and complete case) with respect to bias, efficiency and coverage both in the presence and in the absence of time-varying covariates. Differing degrees and mechanisms of missingness are examined. Preliminary results indicate the non-linear approach to be the method of choice: it has high efficiency and is easily implemented. An illustrative example concerning pulmonary function decline in the Netherlands is analyzed using this method.  相似文献   

12.
The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed.  相似文献   

13.
We present a model for data in the form of matched pairs of counts. Our work is motivated by a problem in fission-track analysis, where the determination of a crystal's age is based on the ratio of counts of spontaneous and induced tracks. It is often reasonable to assume that the counts follow a Poisson distribution, but typically they are overdispersed and there exists a positive correlation between the numbers of spontaneous and induced tracks in the same crystal. We propose a model that allows for both overdispersion and correlation by assuming that the mean densities follow a bivariate Wishart distribution. Our model is quite general, having the usual negative-binomial and Poisson models as special cases. We propose a maximum-likelihood estimation method based on a stochastic implementation of the EM algorithm, and we derive the asymptotic standard errors of the parameter estimates. We illustrate the method with a data set of fission-track counts in matched areas of zircon crystals.  相似文献   

14.
The EM algorithm and its extensions are very popular tools for maximum likelihood estimation in incomplete data setting. However, one of the limitations of these methods is their slow convergence. The PX-EM (parameter-expanded EM) algorithm was proposed by Liu, Rubin and Wu to make EM much faster. On the other hand, stochastic versions of EM are powerful alternatives of EM when the E-step is untractable in a closed form. In this paper we propose the PX-SAEM which is a parameter expansion version of the so-called SAEM (Stochastic Approximation version of EM). PX-SAEM is shown to accelerate SAEM and improve convergence toward the maximum likelihood estimate in a parametric framework. Numerical examples illustrate the behavior of PX-SAEM in linear and nonlinear mixed effects models.  相似文献   

15.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

16.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

17.
Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework.  相似文献   

18.
We propose a new three-parameter ageing distribution called the Weibull-Poisson (WP) distribution, which generalizes the exponential-Poisson (EP) distribution introduced by Kus (2007). This new distribution has a more general form of failure rate (hazard rate) function. With appropriate choice of parameter values, it is able to model three ageing classes of life distributions including decreasing failure rate (DFR), increasing failure rate (IFR), and modified upside-down-bathtub (MUBT)-shaped failure rate. It thus provides an alternative to many existing life distributions. Various properties of this distribution are discussed and the estimation of the parameters is considered by the expectation maximization (EM) algorithm. Also, the asymptotic variance-covariance matrices of these estimates are obtained. Furthermore, some expressions for the Rènyi and Shannon entropies are given. Simulation studies are performed and experimental results are illustrated based on a real data set.  相似文献   

19.
In most applications, the parameters of a mixture of linear regression models are estimated by maximum likelihood using the expectation maximization (EM) algorithm. In this article, we propose the comparison of three algorithms to compute maximum likelihood estimates of the parameters of these models: the EM algorithm, the classification EM algorithm and the stochastic EM algorithm. The comparison of the three procedures was done through a simulation study of the performance (computational effort, statistical properties of estimators and goodness of fit) of these approaches on simulated data sets.

Simulation results show that the choice of the approach depends essentially on the configuration of the true regression lines and the initialization of the algorithms.  相似文献   

20.
Linear random effects models for longitudinal data discussed by Laird and Ware (1982), Jennrich and Schluchter (1986), Lange and Laird (1989), and others are extended in a straight forward manner to nonlinear random effects models. This results in a simple computational approach which accommodates patterned covariance matrices and data insufficient for fitting each subject separately. The technique is demonstrated with an interesting medical data set, and a short, simple SAS PROC IML program based on the EM algorithm is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号