首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

2.
We consider the use of an EM algorithm for fitting finite mixture models when mixture component size is known. This situation can occur in a number of settings, where individual membership is unknown but aggregate membership is known. When the mixture component size, i.e., the aggregate mixture component membership, is known, it is common practice to treat only the mixing probability as known. This approach does not, however, entirely account for the fact that the number of observations within each mixture component is known, which may result in artificially incorrect estimates of parameters. By fully capitalizing on the available information, the proposed EM algorithm shows robustness to the choice of starting values and exhibits numerically stable convergence properties.  相似文献   

3.
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html.  相似文献   

4.
The three-parameter asymmetric Laplace distribution (ALD) has received increasing attention in the field of quantile regression due to an important feature between its location and asymmetric parameters. On the basis of the representation of the ALD as a normal-variance–mean mixture with an exponential mixing distribution, this article develops EM and generalized EM algorithms, respectively, for computing regression quantiles of linear and nonlinear regression models. It is interesting to show that the proposed EM algorithm and the MM (Majorization–Minimization) algorithm for quantile regressions are really the same in terms of computation, since the updating formula of them are the same. This provides a good example that connects the EM and MM algorithms. Simulation studies show that the EM algorithm can successfully recover the true parameters in quantile regressions.  相似文献   

5.
The family of power series cure rate models provides a flexible modeling framework for survival data of populations with a cure fraction. In this work, we present a simplified estimation procedure for the maximum likelihood (ML) approach. ML estimates are obtained via the expectation-maximization (EM) algorithm where the expectation step involves computation of the expected number of concurrent causes for each individual. It has the big advantage that the maximization step can be decomposed into separate maximizations of two lower-dimensional functions of the regression and survival distribution parameters, respectively. Two simulation studies are performed: the first to investigate the accuracy of the estimation procedure for different numbers of covariates and the second to compare our proposal with the direct maximization of the observed log-likelihood function. Finally, we illustrate the technique for parameter estimation on a dataset of survival times for patients with malignant melanoma.  相似文献   

6.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

7.
Based on progressively type-II censored data, the maximum-likelihood estimators (MLEs) for the Lomax parameters are derived using the expectation–maximization (EM) algorithm. Moreover, the expected Fisher information matrix based on the missing value principle is computed. Using extensive simulation and three criteria, namely, bias, root mean squared error and Pitman closeness measures, we compare the performance of the MLEs via the EM algorithm and the Newton–Raphson (NR) method. It is concluded that the EM algorithm outperforms the NR method in all the cases. Two real data examples are used to illustrate our proposed estimators.  相似文献   

8.
Abstract.  An expectation maximization (EM) algorithm is proposed to find fibre length distributions in standing trees. The available data come from cylindric wood samples (increment cores). The sample contains uncut fibres as well as fibres cut once or twice. The sample contains not only fibres, but also other cells, the so-called 'fines'. The lengths are measured by an automatic fibre-analyser, which is not able to distinguish fines from fibres and cannot tell if a cell has been cut. The data thus come from a censored version of a mixture of the fine and fibre length distributions in the tree. The parameters of the length distributions are estimated by a stochastic version of the EM algorithm, and an estimate of the corresponding covariance matrix is derived. The method is applied to data from northern Sweden. A simulation study is also presented. The method works well for sample sizes commonly obtained from increment cores.  相似文献   

9.
The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp.  相似文献   

10.
Iterative reweighting (IR) is a popular method for computing M-estimates of location and scatter in multivariate robust estimation. When the objective function comes from a scale mixture of normal distributions the iterative reweighting algorithm can be identified as an EM algorithm. The purpose of this paper is to show that in the special case of the multivariate t-distribution, substantial improvements to the convergence rate can be obtained by modifying the EM algorithm.  相似文献   

11.
In this paper, we consider the four-parameter bivariate generalized exponential distribution proposed by Kundu and Gupta [Bivariate generalized exponential distribution, J. Multivariate Anal. 100 (2009), pp. 581–593] and propose an expectation–maximization algorithm to find the maximum-likelihood estimators of the four parameters under random left censoring. A numerical experiment is carried out to discuss the properties of the estimators obtained iteratively.  相似文献   

12.
Weak consistency and asymptotic normality is shown for a stochastic EM algorithm for censored data from a mixture of distributions under lognormal assumptions. The asymptotic properties hold for all parameters of the distributions, including the mixing parameter. In order to make parameter estimation meaningful it is necessary to know that the censored mixture distribution is identifiable. General conditions under which this is the case are given. The stochastic EM algorithm addressed in this paper is used for estimation of wood fibre length distributions based on optically measured data from cylindric wood samples (increment cores).  相似文献   

13.
The established general results on convergence properties of the EM algorithm require the sequence of EM parameter estimates to fall in the interior of the parameter space over which the likelihood is being maximized. This paper presents convergence properties of the EM sequence of likelihood values and parameter estimates in constrained parameter spaces for which the sequence of EM parameter estimates may converge to the boundary of the constrained parameter space contained in the interior of the unconstrained parameter space. Examples of the behavior of the EM algorithm applied to such parameter spaces are presented.  相似文献   

14.
The Expectation–Maximization (EM) algorithm is a very popular technique for maximum likelihood estimation in incomplete data models. When the expectation step cannot be performed in closed form, a stochastic approximation of EM (SAEM) can be used. Under very general conditions, the authors have shown that the attractive stationary points of the SAEM algorithm correspond to the global and local maxima of the observed likelihood. In order to avoid convergence towards a local maxima, a simulated annealing version of SAEM is proposed. An illustrative application to the convolution model for estimating the coefficients of the filter is given.  相似文献   

15.
Maximum likelihood (ML) estimation with spatial econometric models is a long-standing problem that finds application in several areas of economic importance. The problem is particularly challenging in the presence of missing data, since there is an implied dependence between all units, irrespective of whether they are observed or not. Out of the several approaches adopted for ML estimation in this context, that of LeSage and Pace [Models for spatially dependent missing data. J Real Estate Financ Econ. 2004;29(2):233–254] stands out as one of the most commonly used with spatial econometric models due to its ability to scale with the number of units. Here, we review their algorithm, and consider several similar alternatives that are also suitable for large datasets. We compare the methods through an extensive empirical study and conclude that, while the approximate approaches are suitable for large sampling ratios, for small sampling ratios the only reliable algorithms are those that yield exact ML or restricted ML estimates.  相似文献   

16.
The EM algorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. Two versions, the incremental EM (IEM) algorithm and a sparse version of the EM algorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355–368 to reduce the computational cost of applying the EM algorithm. With the IEM algorithm, the available n observations are divided into B (B n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the EM algorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated; the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the IEM algorithm with various number of blocks and the standard EM algorithm. In particular, we propose a simple rule for choosing the number of blocks with the IEM algorithm. For the IEM algorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the IEM algorithm (SPIEM) is formulated by combining the sparse E-step of the EM algorithm and the partial E-step of the IEM algorithm. This SPIEM algorithm can further reduce the computation time of the IEM algorithm.  相似文献   

17.
In many experiments, several measurements on the same variable are taken over time, a geographic region, or some other index set. It is often of interest to know if there has been a change over the index set in the parameters of the distribution of the variable. Frequently, the data consist of a sequence of correlated random variables, and there may also be several experimental units under observation, each providing a sequence of data. A problem in ascertaining the boundaries between the layers in geological sedimentary beds is used to introduce the model and then to illustrate the proposed methodology. It is assumed that, conditional on the change point, the data from each sequence arise from an autoregressive process that undergoes a change in one or more of its parameters. Unconditionally, the model then becomes a mixture of nonstationary autoregressive processes. Maximum-likelihood methods are used, and results of simulations to evaluate the performance of these estimators under practical conditions are given.  相似文献   

18.
Abstract.  The expectation-maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and stability (e.g. monotonic increase of likelihood). However, in many applications the stability of EM is attained at the expense of slow, linear convergence. We have developed a new class of iterative schemes, called squared iterative methods (SQUAREM), to accelerate EM, without compromising on simplicity and stability. SQUAREM generally achieves superlinear convergence in problems with a large fraction of missing information. Globally convergent schemes are easily obtained by viewing SQUAREM as a continuation of EM. SQUAREM is especially attractive in high-dimensional problems, and in problems where model-specific analytic insights are not available. SQUAREM can be readily implemented as an 'off-the-shelf' accelerator of any EM-type algorithm, as it only requires the EM parameter updating. We present four examples to demonstrate the effectiveness of SQUAREM. A general-purpose implementation (written in R) is available.  相似文献   

19.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

20.
Selection of the important variables is one of the most important model selection problems in statistical applications. In this article, we address variable selection in finite mixture of generalized semiparametric models. To overcome computational burden, we introduce a class of variable selection procedures for finite mixture of generalized semiparametric models using penalized approach for variable selection. Estimation of nonparametric component will be done via multivariate kernel regression. It is shown that the new method is consistent for variable selection and the performance of proposed method will be assessed via simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号