期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Full information maximum likelihood estimation in factor analysis with a large number of missing values

《Journal of Statistical Computation and Simulation》2012,82(1):91-104

We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes. 相似文献

2.

A simplified estimation procedure based on the EM algorithm for the power series cure rate model

Diego I. Gallardo Jose S. Romeo Renate Meyer 《统计学通讯:模拟与计算》2017,46(8):6342-6359

The family of power series cure rate models provides a flexible modeling framework for survival data of populations with a cure fraction. In this work, we present a simplified estimation procedure for the maximum likelihood (ML) approach. ML estimates are obtained via the expectation-maximization (EM) algorithm where the expectation step involves computation of the expected number of concurrent causes for each individual. It has the big advantage that the maximization step can be decomposed into separate maximizations of two lower-dimensional functions of the regression and survival distribution parameters, respectively. Two simulation studies are performed: the first to investigate the accuracy of the estimation procedure for different numbers of covariates and the second to compare our proposal with the direct maximization of the observed log-likelihood function. Finally, we illustrate the technique for parameter estimation on a dataset of survival times for patients with malignant melanoma. 相似文献

3.

Stochastic approximation Monte Carlo EM for change-point analysis

Hwa Kyung Lim Jaejun Lee 《Journal of Statistical Computation and Simulation》2017,87(1):69-87

In the expectation–maximization (EM) algorithm for maximum likelihood estimation from incomplete data, Markov chain Monte Carlo (MCMC) methods have been used in change-point inference for a long time when the expectation step is intractable. However, the conventional MCMC algorithms tend to get trapped in local mode in simulating from the posterior distribution of change points. To overcome this problem, in this paper we propose a stochastic approximation Monte Carlo version of EM (SAMCEM), which is a combination of adaptive Markov chain Monte Carlo and EM utilizing a maximum likelihood method. SAMCEM is compared with the stochastic approximation version of EM and reversible jump Markov chain Monte Carlo version of EM on simulated and real datasets. The numerical results indicate that SAMCEM can outperform among the three methods by producing much more accurate parameter estimates and the ability to achieve change-point positions and estimates simultaneously. 相似文献

4.

A two‐step proximal‐point algorithm for the calculus of divergence‐based estimators in finite mixture models

Diaa Al Mohamad Michel Broniatowski 《Revue canadienne de statistique》2019,47(3):392-408

Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada 相似文献

5.

Fitting mixtures of linear regressions

《Journal of Statistical Computation and Simulation》2012,82(2):201-225

In most applications, the parameters of a mixture of linear regression models are estimated by maximum likelihood using the expectation maximization (EM) algorithm. In this article, we propose the comparison of three algorithms to compute maximum likelihood estimates of the parameters of these models: the EM algorithm, the classification EM algorithm and the stochastic EM algorithm. The comparison of the three procedures was done through a simulation study of the performance (computational effort, statistical properties of estimators and goodness of fit) of these approaches on simulated data sets.

Simulation results show that the choice of the approach depends essentially on the configuration of the true regression lines and the initialization of the algorithms. 相似文献

6.

Heterogeneous data modeling with two-component Weibull–Poisson distribution

Ülkü Erişoğlu Murat Erişoğlu Nazif Çaliş 《Journal of applied statistics》2013,40(11):2451-2461

The mixture distribution models are more useful than pure distributions in modeling of heterogeneous data sets. The aim of this paper is to propose mixture of Weibull–Poisson (WP) distributions to model heterogeneous data sets for the first time. So, a powerful alternative mixture distribution is created for modeling of the heterogeneous data sets. In the study, many features of the proposed mixture of WP distributions are examined. Also, the expectation maximization (EM) algorithm is used to determine the maximum-likelihood estimates of the parameters, and the simulation study is conducted for evaluating the performance of the proposed EM scheme. Applications for two real heterogeneous data sets are given to show the flexibility and potentiality of the new mixture distribution. 相似文献

7.

On a mixture autoregressive model

C. S. Wong & W. K. Li 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(1):95-115

We generalize the Gaussian mixture transition distribution (GMTD) model introduced by Le and co-workers to the mixture autoregressive (MAR) model for the modelling of non-linear time series. The models consist of a mixture of K stationary or non-stationary AR components. The advantages of the MAR model over the GMTD model include a more full range of shape changing predictive distributions and the ability to handle cycles and conditional heteroscedasticity in the time series. The stationarity conditions and autocorrelation function are derived. The estimation is easily done via a simple EM algorithm and the model selection problem is addressed. The shape changing feature of the conditional distributions makes these models capable of modelling time series with multimodal conditional distributions and with heteroscedasticity. The models are applied to two real data sets and compared with other competing models. The MAR models appear to capture features of the data better than other competing models do. 相似文献

8.

Gamma lifetimes and associated inference for interval-censored cure rate model with COM–Poisson competing cause

Piyachart Wiangnak 《统计学通讯:理论与方法》2018,47(6):1491-1509

In this article, we consider a competing cause scenario and assume the wider family of Conway–Maxwell–Poisson (COM–Poisson) distribution to model the number of competing causes. Assuming the type of the data to be interval censored, the main contribution is in developing the steps of the expectation maximization (EM) algorithm to determine the maximum likelihood estimates (MLEs) of the model parameters. A profile likelihood approach within the EM framework is proposed to estimate the COM–Poisson shape parameter. An extensive simulation study is conducted to evaluate the performance of the proposed EM algorithm. Model selection within the wider class of COM–Poisson distribution is carried out using likelihood ratio test and information-based criteria. A study to demonstrate the effect of model mis-specification is also carried out. Finally, the proposed estimation method is applied to a data on smoking cessation and a detailed analysis of the obtained results is presented. 相似文献

9.

Modeling veterans’ health benefit grants using the expectation maximization algorithm

Tatjana Miljkovic Nikita Barabanov 《Journal of applied statistics》2015,42(6):1166-1182

A novel application of the expectation maximization (EM) algorithm is proposed for modeling right-censored multiple regression. Parameter estimates, variability assessment, and model selection are summarized in a multiple regression settings assuming a normal model. The performance of this method is assessed through a simulation study. New formulas for measuring model utility and diagnostics are derived based on the EM algorithm. They include reconstructed coefficient of determination and influence diagnostics based on a one-step deletion method. A real data set, provided by North Dakota Department of Veterans Affairs, is modeled using the proposed methodology. Empirical findings should be of benefit to government policy-makers. 相似文献

10.

Comparisons of computational methods for clustered binary data

Tanujit Dey Chae Young Lim 《Journal of Statistical Computation and Simulation》2013,83(11):2030-2046

Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis. 相似文献

11.

Maximum likelihood estimation of bivariate circular hidden Markov models from incomplete data

Francesco Lagona Marco Picone 《Journal of Statistical Computation and Simulation》2013,83(7):1223-1237

In this paper, we propose a hidden Markov model for the analysis of the time series of bivariate circular observations, by assuming that the data are sampled from bivariate circular densities, whose parameters are driven by the evolution of a latent Markov chain. The model segments the data by accounting for redundancies due to correlations along time and across variables. A computationally feasible expectation maximization (EM) algorithm is provided for the maximum likelihood estimation of the model from incomplete data, by treating the missing values and the states of the latent chain as two different sources of incomplete information. Importance-sampling methods facilitate the computation of bootstrap standard errors of the estimates. The methodology is illustrated on a bivariate time series of wind and wave directions and compared with popular segmentation models for bivariate circular data, which ignore correlations across variables and/or along time. 相似文献

12.

Estimation on Lomax progressive censoring using the EM algorithm

《Journal of Statistical Computation and Simulation》2012,82(5):1035-1052

Based on progressively type-II censored data, the maximum-likelihood estimators (MLEs) for the Lomax parameters are derived using the expectation–maximization (EM) algorithm. Moreover, the expected Fisher information matrix based on the missing value principle is computed. Using extensive simulation and three criteria, namely, bias, root mean squared error and Pitman closeness measures, we compare the performance of the MLEs via the EM algorithm and the Newton–Raphson (NR) method. It is concluded that the EM algorithm outperforms the NR method in all the cases. Two real data examples are used to illustrate our proposed estimators. 相似文献

13.

Identification of target clusters by using the restricted normal mixture model

Seung-Gu Kim Yung-Seop Lee 《Journal of applied statistics》2013,40(5):941-960

This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification. 相似文献

14.

Parameter estimation for mixtures of skew Laplace normal distributions and application in mixture regression modeling

Fatma Zehra Doğru Olcay Arslan 《统计学通讯:理论与方法》2017,46(21):10879-10896

In this article, we propose mixtures of skew Laplace normal (SLN) distributions to model both skewness and heavy-tailedness in the neous data set as an alternative to mixtures of skew Student-t-normal (STN) distributions. We give the expectation–maximization (EM) algorithm to obtain the maximum likelihood (ML) estimators for the parameters of interest. We also analyze the mixture regression model based on the SLN distribution and provide the ML estimators of the parameters using the EM algorithm. The performance of the proposed mixture model is illustrated by a simulation study and two real data examples. 相似文献

15.

Fitting finite mixture models using iterative Monte Carlo classification

Jing Xu Jun Ma 《统计学通讯:理论与方法》2017,46(13):6684-6693

Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities. 相似文献

16.

Blinded assessment of treatment effects for survival endpoint in an ongoing trial

Jun Xie Hui Quan Ji Zhang 《Pharmaceutical statistics》2012,11(3):204-213

Many assumptions, including assumptions regarding treatment effects, are made at the design stage of a clinical trial for power and sample size calculations. It is desirable to check these assumptions during the trial by using blinded data. Methods for sample size re‐estimation based on blinded data analyses have been proposed for normal and binary endpoints. However, there is a debate that no reliable estimate of the treatment effect can be obtained in a typical clinical trial situation. In this paper, we consider the case of a survival endpoint and investigate the feasibility of estimating the treatment effect in an ongoing trial without unblinding. We incorporate information of a surrogate endpoint and investigate three estimation procedures, including a classification method and two expectation–maximization (EM) algorithms. Simulations and a clinical trial example are used to assess the performance of the procedures. Our studies show that the expectation–maximization algorithms highly depend on the initial estimates of the model parameters. Despite utilization of a surrogate endpoint, all three methods have large variations in the treatment effect estimates and hence fail to provide a precise conclusion about the treatment effect. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

17.

Mixtures of autoregressive-autoregressive conditionally heteroscedastic models: semi-parametric approach

Arash Nademi Rahman Farnoosh 《Journal of applied statistics》2014,41(2):275-293

We propose data generating structures which can be represented as a mixture of autoregressive-autoregressive conditionally heteroscedastic models. The switching between the states is governed by a hidden Markov chain. We investigate semi-parametric estimators for estimating the functions based on the quasi-maximum likelihood approach and provide sufficient conditions for geometric ergodicity of the process. We also present an expectation–maximization algorithm for calculating the estimates numerically. 相似文献

18.

Maximum Likelihood Inference for Multivariate Frailty Models Using an Automated Monte Carlo EM Algorithm

Ripatti S Larsen K Palmgren J 《Lifetime data analysis》2002,8(4):349-360

We present a maximum likelihood estimation procedure for the multivariate frailty model. The estimation is based on a Monte Carlo EM algorithm. The expectation step is approximated by averaging over random samples drawn from the posterior distribution of the frailties using rejection sampling. The maximization step reduces to a standard partial likelihood maximization. We also propose a simple rule based on the relative change in the parameter estimates to decide on sample size in each iteration and a stopping time for the algorithm. An important new concept is acquiring absolute convergence of the algorithm through sample size determination and an efficient sampling technique. The method is illustrated using a rat carcinogenesis dataset and data on vase lifetimes of cut roses. The estimation results are compared with approximate inference based on penalized partial likelihood using these two examples. Unlike the penalized partial likelihood estimation, the proposed full maximum likelihood estimation method accounts for all the uncertainty while estimating standard errors for the parameters. 相似文献

19.

Unsupervised learning of regression mixture models with unknown number of components

《Journal of Statistical Computation and Simulation》2012,82(12):2308-2334

ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications. 相似文献

20.

An EM algorithm for estimation in the mixture transition distribution model

《Journal of Statistical Computation and Simulation》2012,82(8):713-729

The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp. 相似文献