首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp.  相似文献   

2.
Abstract

In this article, we revisit the problem of fitting a mixture model under the assumption that the mixture components are symmetric and log-concave. To this end, we first study the nonparametric maximum likelihood estimation (MLE) of a monotone log-concave probability density. To fit the mixture model, we propose a semiparametric EM (SEM) algorithm, which can be adapted to other semiparametric mixture models. In our numerical experiments, we compare our algorithm to that of Balabdaoui and Doss (2018 Balabdaoui, F., and C. R. Doss. 2018. Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):105371.[Crossref], [Web of Science ®] [Google Scholar], Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):1053–71) and other mixture models both on simulated and real-world datasets.  相似文献   

3.
This paper introduces a new approach, based on dependent univariate GLMs, for fitting multivariate mixture models. This approach is a multivariate generalization of the method for univariate mixtures presented by Hinde (1982). Its accuracy and efficiency are compared with direct maximization of the log-likelihood. Using a simulation study, we also compare the efficiency of Monte Carlo and Gaussian quadrature methods for approximating the mixture distribution. The new approach with Gaussian quadrature outperforms the alternative methods considered. The work is motivated by the multivariate mixture models which have been proposed for modelling changes of employment states at an individual level. Similar formulations are of interest for modelling movement between other social and economic states and multivariate mixture models also occur in biostatistics and epidemiology.  相似文献   

4.
Karlis and Santourian [14 D. Karlis and A. Santourian, Model-based clustering with non-elliptically contoured distribution, Stat. Comput. 19 (2009), pp. 7383. doi: 10.1007/s11222-008-9072-0[Crossref], [Web of Science ®] [Google Scholar]] proposed a model-based clustering algorithm, the expectation–maximization (EM) algorithm, to fit the mixture of multivariate normal-inverse Gaussian (NIG) distribution. However, the EM algorithm for the mixture of multivariate NIG requires a set of initial values to begin the iterative process, and the number of components has to be given a priori. In this paper, we present a learning-based EM algorithm: its aim is to overcome the aforementioned weaknesses of Karlis and Santourian's EM algorithm [14 D. Karlis and A. Santourian, Model-based clustering with non-elliptically contoured distribution, Stat. Comput. 19 (2009), pp. 7383. doi: 10.1007/s11222-008-9072-0[Crossref], [Web of Science ®] [Google Scholar]]. The proposed learning-based EM algorithm was first inspired by Yang et al. [24 M.-S. Yang, C.-Y. Lai, and C.-Y. Lin, A robust EM clustering algorithm for Gaussian mixture models, Pattern Recognit. 45 (2012), pp. 39503961. doi: 10.1016/j.patcog.2012.04.031[Crossref], [Web of Science ®] [Google Scholar]]: the process of how they perform self-clustering was then simulated. Numerical experiments showed promising results compared to Karlis and Santourian's EM algorithm. Moreover, the methodology is applicable to the analysis of extrasolar planets. Our analysis provides an understanding of the clustering results in the ln?P?ln?M and ln?P?e spaces, where M is the planetary mass, P is the orbital period and e is orbital eccentricity. Our identified groups interpret two phenomena: (1) the characteristics of two clusters in ln?P?ln?M space might be related to the tidal and disc interactions (see [9 I.G. Jiang, W.H. Ip, and L.C. Yeh, On the fate of close-in extrasolar planets, Astrophys. J. 582 (2003), pp. 449454. doi: 10.1086/344590[Crossref], [Web of Science ®] [Google Scholar]]); and (2) there are two clusters in ln?P?e space.  相似文献   

5.
The label-switching problem is one of the fundamental problems in Bayesian mixture analysis. Using all the Markov chain Monte Carlo samples as the initials for the expectation-maximization (EM) algorithm, we propose to label the samples based on the modes they converge to. Our method is based on the assumption that the samples converged to the same mode have the same labels. If a relative noninformative prior is used or the sample size is large, the posterior will be close to the likelihood and then the posterior modes can be located approximately by the EM algorithm for mixture likelihood, without assuming the availability of the closed form of the posterior. In order to speed up the computation of this labeling method, we also propose to first cluster the samples by K-means with a large number of clusters K. Then, by assuming that the samples within each cluster have the same labels, we only need to find one converged mode for each cluster. Using a Monte Carlo simulation study and a real dataset, we demonstrate the success of our new method in dealing with the label-switching problem.  相似文献   

6.
It is well known that the log-likelihood function for samples coming from normal mixture distributions may present spurious maxima and singularities. For this reason here we reformulate some Hathaways results and we propose two constrained estimation procedures for multivariate normal mixture modelling according to the likelihood approach. Their perfomances are illustrated on the grounds of some numerical simulations based on the EM algorithm. A comparison between multivariate normal mixtures and the hot-deck approach in missing data imputation is also considered.Salvatore Ingrassia: S. Ingrassia carried out the research as part of the project Metodi Statistici e Reti Neuronali per lAnalisi di Dati Complessi (PRIN 2000, resp. G. Lunetta).  相似文献   

7.
Mixture model-based clustering is widely used in many applications. In certain real-time applications the rapid increase of data size with time makes classical clustering algorithms too slow. An online clustering algorithm based on mixture models is presented in the context of a real-time flaw-diagnosis application for pressurized containers which uses data from acoustic emission signals. The proposed algorithm is a stochastic gradient algorithm derived from the classification version of the EM algorithm (CEM). It provides a model-based generalization of the well-known online k-means algorithm, able to handle non-spherical clusters. Using synthetic and real data sets, the proposed algorithm is compared with the batch CEM algorithm and the online EM algorithm. The three approaches generate comparable solutions in terms of the resulting partition when clusters are relatively well separated, but online algorithms become faster as the size of the available observations increases.  相似文献   

8.
The lasso is a popular technique of simultaneous estimation and variable selection in many research areas. The marginal posterior mode of the regression coefficients is equivalent to estimates given by the non-Bayesian lasso when the regression coefficients have independent Laplace priors. Because of its flexibility of statistical inferences, the Bayesian approach is attracting a growing body of research in recent years. Current approaches are primarily to either do a fully Bayesian analysis using Markov chain Monte Carlo (MCMC) algorithm or use Monte Carlo expectation maximization (MCEM) methods with an MCMC algorithm in each E-step. However, MCMC-based Bayesian method has much computational burden and slow convergence. Tan et al. [An efficient MCEM algorithm for fitting generalized linear mixed models for correlated binary data. J Stat Comput Simul. 2007;77:929–943] proposed a non-iterative sampling approach, the inverse Bayes formula (IBF) sampler, for computing posteriors of a hierarchical model in the structure of MCEM. Motivated by their paper, we develop this IBF sampler in the structure of MCEM to give the marginal posterior mode of the regression coefficients for the Bayesian lasso, by adjusting the weights of importance sampling, when the full conditional distribution is not explicit. Simulation experiments show that the computational time is much reduced with our method based on the expectation maximization algorithm and our algorithms and our methods behave comparably with other Bayesian lasso methods not only in prediction accuracy but also in variable selection accuracy and even better especially when the sample size is relatively large.  相似文献   

9.
In this paper, we propose an adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion. The method, called M-PMC, is shown to be applicable to a wide class of importance sampling densities, which includes in particular mixtures of multivariate Student t distributions. The performance of the proposed scheme is studied on both artificial and real examples, highlighting in particular the benefit of a novel Rao-Blackwellisation device which can be easily incorporated in the updating scheme. This work has been supported by the Agence Nationale de la Recherche (ANR) through the 2006–2008 project ’ . Both last authors are grateful to the participants to the BIRS meeting on “Bioinformatics, Genetics and Stochastic Computation: Bridging the Gap”, Banff, for their comments on an earlier version of this paper. The last author also acknowledges an helpful discussion with Geoff McLachlan. The authors wish to thank both referees for their encouraging comments.  相似文献   

10.
In the expectation–maximization (EM) algorithm for maximum likelihood estimation from incomplete data, Markov chain Monte Carlo (MCMC) methods have been used in change-point inference for a long time when the expectation step is intractable. However, the conventional MCMC algorithms tend to get trapped in local mode in simulating from the posterior distribution of change points. To overcome this problem, in this paper we propose a stochastic approximation Monte Carlo version of EM (SAMCEM), which is a combination of adaptive Markov chain Monte Carlo and EM utilizing a maximum likelihood method. SAMCEM is compared with the stochastic approximation version of EM and reversible jump Markov chain Monte Carlo version of EM on simulated and real datasets. The numerical results indicate that SAMCEM can outperform among the three methods by producing much more accurate parameter estimates and the ability to achieve change-point positions and estimates simultaneously.  相似文献   

11.
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html.  相似文献   

12.
In this article we investigate the relationship between the EM algorithm and the Gibbs sampler. We show that the approximate rate of convergence of the Gibbs sampler by Gaussian approximation is equal to that of the corresponding EM-type algorithm. This helps in implementing either of the algorithms as improvement strategies for one algorithm can be directly transported to the other. In particular, by running the EM algorithm we know approximately how many iterations are needed for convergence of the Gibbs sampler. We also obtain a result that under certain conditions, the EM algorithm used for finding the maximum likelihood estimates can be slower to converge than the corresponding Gibbs sampler for Bayesian inference. We illustrate our results in a number of realistic examples all based on the generalized linear mixed models.  相似文献   

13.
Summary. A major difficulty in meta-analysis is publication bias . Studies with positive outcomes are more likely to be published than studies reporting negative or inconclusive results. Correcting for this bias is not possible without making untestable assumptions. In this paper, a sensitivity analysis is discussed for the meta-analysis of 2×2 tables using exact conditional distributions. A Markov chain Monte Carlo EM algorithm is used to calculate maximum likelihood estimates. A rule for increasing the accuracy of estimation and automating the choice of the number of iterations is suggested.  相似文献   

14.
Weak consistency and asymptotic normality is shown for a stochastic EM algorithm for censored data from a mixture of distributions under lognormal assumptions. The asymptotic properties hold for all parameters of the distributions, including the mixing parameter. In order to make parameter estimation meaningful it is necessary to know that the censored mixture distribution is identifiable. General conditions under which this is the case are given. The stochastic EM algorithm addressed in this paper is used for estimation of wood fibre length distributions based on optically measured data from cylindric wood samples (increment cores).  相似文献   

15.
In this article, we propose an efficient and robust estimation for the semiparametric mixture model that is a mixture of unknown location-shifted symmetric distributions. Our estimation is derived by minimizing the profile Hellinger distance (MPHD) between the model and a nonparametric density estimate. We propose a simple and efficient algorithm to find the proposed MPHD estimation. Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed procedure and to compare it with other existing methods. Based on our empirical studies, the newly proposed procedure works very competitively compared to the existing methods for normal component cases and much better for non-normal component cases. More importantly, the proposed procedure is robust when the data are contaminated with outlying observations. A real data application is also provided to illustrate the proposed estimation procedure.  相似文献   

16.
Summary.  Likelihood inference for discretely observed Markov jump processes with finite state space is investigated. The existence and uniqueness of the maximum likelihood estimator of the intensity matrix are investigated. This topic is closely related to the imbedding problem for Markov chains. It is demonstrated that the maximum likelihood estimator can be found either by the EM algorithm or by a Markov chain Monte Carlo procedure. When the maximum likelihood estimator does not exist, an estimator can be obtained by using a penalized likelihood function or by the Markov chain Monte Carlo procedure with a suitable prior. The methodology and its implementation are illustrated by examples and simulation studies.  相似文献   

17.
ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

18.
Two new implementations of the EM algorithm are proposed for maximum likelihood fitting of generalized linear mixed models. Both methods use random (independent and identically distributed) sampling to construct Monte Carlo approximations at the E-step. One approach involves generating random samples from the exact conditional distribution of the random effects (given the data) by rejection sampling, using the marginal distribution as a candidate. The second method uses a multivariate t importance sampling approximation. In many applications the two methods are complementary. Rejection sampling is more efficient when sample sizes are small, whereas importance sampling is better with larger sample sizes. Monte Carlo approximation using random samples allows the Monte Carlo error at each iteration to be assessed by using standard central limit theory combined with Taylor series methods. Specifically, we construct a sandwich variance estimate for the maximizer at each approximate E-step. This suggests a rule for automatically increasing the Monte Carlo sample size after iterations in which the true EM step is swamped by Monte Carlo error. In contrast, techniques for assessing Monte Carlo error have not been developed for use with alternative implementations of Monte Carlo EM algorithms utilizing Markov chain Monte Carlo E-step approximations. Three different data sets, including the infamous salamander data of McCullagh and Nelder, are used to illustrate the techniques and to compare them with the alternatives. The results show that the methods proposed can be considerably more efficient than those based on Markov chain Monte Carlo algorithms. However, the methods proposed may break down when the intractable integrals in the likelihood function are of high dimension.  相似文献   

19.
A faster alternative to the EM algorithm in finite mixture distributions is described, which alternates EM iterations with Gauss-Newton iterations using the observed information matrix. At the expense of modest additional analytical effort in obtaining the observed information, the hybrid algorithm reduces the computing time required and provides asymptotic standard errors at convergence. The algorithm is illustrated on the two-component normal mixture.  相似文献   

20.
We propose a simulation-based Bayesian approach to the analysis of long memory stochastic volatility models, stationary and nonstationary. The main tool used to reduce the likelihood function to a tractable form is an approximate state-space representation of the model, A data set of stock market returns is analyzed with the proposed method. The approach taken here allows a quantitative assessment of the empirical evidence in favor of the stationarity, or nonstationarity, of the instantaneous volatility of the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号