首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.  相似文献   

2.
The family of power series cure rate models provides a flexible modeling framework for survival data of populations with a cure fraction. In this work, we present a simplified estimation procedure for the maximum likelihood (ML) approach. ML estimates are obtained via the expectation-maximization (EM) algorithm where the expectation step involves computation of the expected number of concurrent causes for each individual. It has the big advantage that the maximization step can be decomposed into separate maximizations of two lower-dimensional functions of the regression and survival distribution parameters, respectively. Two simulation studies are performed: the first to investigate the accuracy of the estimation procedure for different numbers of covariates and the second to compare our proposal with the direct maximization of the observed log-likelihood function. Finally, we illustrate the technique for parameter estimation on a dataset of survival times for patients with malignant melanoma.  相似文献   

3.
In the expectation–maximization (EM) algorithm for maximum likelihood estimation from incomplete data, Markov chain Monte Carlo (MCMC) methods have been used in change-point inference for a long time when the expectation step is intractable. However, the conventional MCMC algorithms tend to get trapped in local mode in simulating from the posterior distribution of change points. To overcome this problem, in this paper we propose a stochastic approximation Monte Carlo version of EM (SAMCEM), which is a combination of adaptive Markov chain Monte Carlo and EM utilizing a maximum likelihood method. SAMCEM is compared with the stochastic approximation version of EM and reversible jump Markov chain Monte Carlo version of EM on simulated and real datasets. The numerical results indicate that SAMCEM can outperform among the three methods by producing much more accurate parameter estimates and the ability to achieve change-point positions and estimates simultaneously.  相似文献   

4.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

5.
In most applications, the parameters of a mixture of linear regression models are estimated by maximum likelihood using the expectation maximization (EM) algorithm. In this article, we propose the comparison of three algorithms to compute maximum likelihood estimates of the parameters of these models: the EM algorithm, the classification EM algorithm and the stochastic EM algorithm. The comparison of the three procedures was done through a simulation study of the performance (computational effort, statistical properties of estimators and goodness of fit) of these approaches on simulated data sets.

Simulation results show that the choice of the approach depends essentially on the configuration of the true regression lines and the initialization of the algorithms.  相似文献   

6.
The mixture distribution models are more useful than pure distributions in modeling of heterogeneous data sets. The aim of this paper is to propose mixture of Weibull–Poisson (WP) distributions to model heterogeneous data sets for the first time. So, a powerful alternative mixture distribution is created for modeling of the heterogeneous data sets. In the study, many features of the proposed mixture of WP distributions are examined. Also, the expectation maximization (EM) algorithm is used to determine the maximum-likelihood estimates of the parameters, and the simulation study is conducted for evaluating the performance of the proposed EM scheme. Applications for two real heterogeneous data sets are given to show the flexibility and potentiality of the new mixture distribution.  相似文献   

7.
We generalize the Gaussian mixture transition distribution (GMTD) model introduced by Le and co-workers to the mixture autoregressive (MAR) model for the modelling of non-linear time series. The models consist of a mixture of K stationary or non-stationary AR components. The advantages of the MAR model over the GMTD model include a more full range of shape changing predictive distributions and the ability to handle cycles and conditional heteroscedasticity in the time series. The stationarity conditions and autocorrelation function are derived. The estimation is easily done via a simple EM algorithm and the model selection problem is addressed. The shape changing feature of the conditional distributions makes these models capable of modelling time series with multimodal conditional distributions and with heteroscedasticity. The models are applied to two real data sets and compared with other competing models. The MAR models appear to capture features of the data better than other competing models do.  相似文献   

8.
In this article, we consider a competing cause scenario and assume the wider family of Conway–Maxwell–Poisson (COM–Poisson) distribution to model the number of competing causes. Assuming the type of the data to be interval censored, the main contribution is in developing the steps of the expectation maximization (EM) algorithm to determine the maximum likelihood estimates (MLEs) of the model parameters. A profile likelihood approach within the EM framework is proposed to estimate the COM–Poisson shape parameter. An extensive simulation study is conducted to evaluate the performance of the proposed EM algorithm. Model selection within the wider class of COM–Poisson distribution is carried out using likelihood ratio test and information-based criteria. A study to demonstrate the effect of model mis-specification is also carried out. Finally, the proposed estimation method is applied to a data on smoking cessation and a detailed analysis of the obtained results is presented.  相似文献   

9.
A novel application of the expectation maximization (EM) algorithm is proposed for modeling right-censored multiple regression. Parameter estimates, variability assessment, and model selection are summarized in a multiple regression settings assuming a normal model. The performance of this method is assessed through a simulation study. New formulas for measuring model utility and diagnostics are derived based on the EM algorithm. They include reconstructed coefficient of determination and influence diagnostics based on a one-step deletion method. A real data set, provided by North Dakota Department of Veterans Affairs, is modeled using the proposed methodology. Empirical findings should be of benefit to government policy-makers.  相似文献   

10.
Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis.  相似文献   

11.
In this paper, we propose a hidden Markov model for the analysis of the time series of bivariate circular observations, by assuming that the data are sampled from bivariate circular densities, whose parameters are driven by the evolution of a latent Markov chain. The model segments the data by accounting for redundancies due to correlations along time and across variables. A computationally feasible expectation maximization (EM) algorithm is provided for the maximum likelihood estimation of the model from incomplete data, by treating the missing values and the states of the latent chain as two different sources of incomplete information. Importance-sampling methods facilitate the computation of bootstrap standard errors of the estimates. The methodology is illustrated on a bivariate time series of wind and wave directions and compared with popular segmentation models for bivariate circular data, which ignore correlations across variables and/or along time.  相似文献   

12.
Based on progressively type-II censored data, the maximum-likelihood estimators (MLEs) for the Lomax parameters are derived using the expectation–maximization (EM) algorithm. Moreover, the expected Fisher information matrix based on the missing value principle is computed. Using extensive simulation and three criteria, namely, bias, root mean squared error and Pitman closeness measures, we compare the performance of the MLEs via the EM algorithm and the Newton–Raphson (NR) method. It is concluded that the EM algorithm outperforms the NR method in all the cases. Two real data examples are used to illustrate our proposed estimators.  相似文献   

13.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

14.
In this article, we propose mixtures of skew Laplace normal (SLN) distributions to model both skewness and heavy-tailedness in the neous data set as an alternative to mixtures of skew Student-t-normal (STN) distributions. We give the expectation–maximization (EM) algorithm to obtain the maximum likelihood (ML) estimators for the parameters of interest. We also analyze the mixture regression model based on the SLN distribution and provide the ML estimators of the parameters using the EM algorithm. The performance of the proposed mixture model is illustrated by a simulation study and two real data examples.  相似文献   

15.
Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities.  相似文献   

16.
Many assumptions, including assumptions regarding treatment effects, are made at the design stage of a clinical trial for power and sample size calculations. It is desirable to check these assumptions during the trial by using blinded data. Methods for sample size re‐estimation based on blinded data analyses have been proposed for normal and binary endpoints. However, there is a debate that no reliable estimate of the treatment effect can be obtained in a typical clinical trial situation. In this paper, we consider the case of a survival endpoint and investigate the feasibility of estimating the treatment effect in an ongoing trial without unblinding. We incorporate information of a surrogate endpoint and investigate three estimation procedures, including a classification method and two expectation–maximization (EM) algorithms. Simulations and a clinical trial example are used to assess the performance of the procedures. Our studies show that the expectation–maximization algorithms highly depend on the initial estimates of the model parameters. Despite utilization of a surrogate endpoint, all three methods have large variations in the treatment effect estimates and hence fail to provide a precise conclusion about the treatment effect. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
We propose data generating structures which can be represented as a mixture of autoregressive-autoregressive conditionally heteroscedastic models. The switching between the states is governed by a hidden Markov chain. We investigate semi-parametric estimators for estimating the functions based on the quasi-maximum likelihood approach and provide sufficient conditions for geometric ergodicity of the process. We also present an expectation–maximization algorithm for calculating the estimates numerically.  相似文献   

18.
We present a maximum likelihood estimation procedure for the multivariate frailty model. The estimation is based on a Monte Carlo EM algorithm. The expectation step is approximated by averaging over random samples drawn from the posterior distribution of the frailties using rejection sampling. The maximization step reduces to a standard partial likelihood maximization. We also propose a simple rule based on the relative change in the parameter estimates to decide on sample size in each iteration and a stopping time for the algorithm. An important new concept is acquiring absolute convergence of the algorithm through sample size determination and an efficient sampling technique. The method is illustrated using a rat carcinogenesis dataset and data on vase lifetimes of cut roses. The estimation results are compared with approximate inference based on penalized partial likelihood using these two examples. Unlike the penalized partial likelihood estimation, the proposed full maximum likelihood estimation method accounts for all the uncertainty while estimating standard errors for the parameters.  相似文献   

19.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

20.
The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号