期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fitting finite mixture models using iterative Monte Carlo classification

Jing Xu Jun Ma 《统计学通讯:理论与方法》2017,46(13):6684-6693

Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities. 相似文献

2.

On a mixture autoregressive model

C. S. Wong & W. K. Li 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(1):95-115

We generalize the Gaussian mixture transition distribution (GMTD) model introduced by Le and co-workers to the mixture autoregressive (MAR) model for the modelling of non-linear time series. The models consist of a mixture of K stationary or non-stationary AR components. The advantages of the MAR model over the GMTD model include a more full range of shape changing predictive distributions and the ability to handle cycles and conditional heteroscedasticity in the time series. The stationarity conditions and autocorrelation function are derived. The estimation is easily done via a simple EM algorithm and the model selection problem is addressed. The shape changing feature of the conditional distributions makes these models capable of modelling time series with multimodal conditional distributions and with heteroscedasticity. The models are applied to two real data sets and compared with other competing models. The MAR models appear to capture features of the data better than other competing models do. 相似文献

3.

Clustering Dependencies Via Mixtures of Copulas

Veni Arakelian Dimitris Karlis 《统计学通讯:模拟与计算》2013,42(7):1644-1661

相似文献

4.

A class of finite mixture of quantile regressions with its applications

Yuzhu Tian Maozai Tian 《Journal of applied statistics》2016,43(7):1240-1252

Mixture of linear regression models provide a popular treatment for modeling nonlinear regression relationship. The traditional estimation of mixture of regression models is based on Gaussian error assumption. It is well known that such assumption is sensitive to outliers and extreme values. To overcome this issue, a new class of finite mixture of quantile regressions (FMQR) is proposed in this article. Compared with the existing Gaussian mixture regression models, the proposed FMQR model can provide a complete specification on the conditional distribution of response variable for each component. From the likelihood point of view, the FMQR model is equivalent to the finite mixture of regression models based on errors following asymmetric Laplace distribution (ALD), which can be regarded as an extension to the traditional mixture of regression models with normal error terms. An EM algorithm is proposed to obtain the parameter estimates of the FMQR model by combining a hierarchical representation of the ALD. Finally, the iterated weighted least square estimation for each mixture component of the FMQR model is derived. Simulation studies are conducted to illustrate the finite sample performance of the estimation procedure. Analysis of an aphid data set is used to illustrate our methodologies. 相似文献

5.

An em algorithm for a semiparametric finite mixture model

《Journal of Statistical Computation and Simulation》2012,82(10):791-802

We propose a semiparametric version of the EM algorithm under the semiparametric mixture model introduced by Anderson (1979, Biometrika , 66 , 17-26). It is shown that the sequence of proposed EM iterates, irrespective of the starting value, converges to the maximum semiparametric likelihood estimator of the vector of parameters in the semiparametric mixture model. The proposed EM algorithm preserves the appealing monotone convergence property of the standard EM algorithm and can be implemented by employing the standard logistic regression program. We present one example to demonstrate the performance of the proposed EM algorithm. 相似文献

6.

Identifiability and Comparison of Estimation Methods on Weibull Mixture Models

Olga Vladimirovna Panteleeva Eduardo Gutiérrez González Humberto Vaquera Huerta José A. Villaseñor Alva 《统计学通讯:模拟与计算》2015,44(7):1879-1900

In this article, the finite mixture model of Weibull distributions is studied, the identifiability of the model with m components is proven, and the parameter estimators for the case of two components resulted by several algorithms are compared. The parameter estimators are obtained with maximum likelihood performing calculations with different algorithms: expectation-maximization (EM), Fisher scoring, backfitting, optimization of k-nearest neighbor approach, and random walk algorithm using Monte Carlo simulation. The Akaike information criterion and the log-likelihood value are used to compare models. In general, the proposed random walk algorithm shows better performance in mean square error and bias. Finally, the results are applied to electronic component lifetime data. 相似文献

7.

Starting Values for EM Estimation of Latent Class Joint Model

Jun Han 《统计学通讯:模拟与计算》2013,42(7):1519-1534

Latent class model is one of the important latent variable methods for joint modeling longitudinal and survival data. Latent class joint model can handle underlying heterogeneous population, discover subpopulation structure, and incorporate correlated non normally distributed outcomes. The maximum likelihood estimates of parameters in latent class joint model are generally obtained by the EM algorithm. Finding the starting values is one of the major issues to implement the EM algorithm successfully. In this article, initial value formulas are provided, a simulation study is conducted to show that the proposed starting values perform very well, and two illustrative examples are presented. 相似文献

8.

Random effects regression models for count data with excess zeros in caries research

D. Todem Y. Zhang A. Ismail W. Sohn 《Journal of applied statistics》2010,37(10):1661-1679

We extend the family of Poisson and negative binomial models to derive the joint distribution of clustered count outcomes with extra zeros. Two random effects models are formulated. The first model assumes a shared random effects term between the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxes the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two different but correlated random effects variables. Under the conditional independence and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm are proposed to fit the proposed models. Our proposed models are fitted to dental caries counts of children under the age of six in the city of Detroit. 相似文献

9.

FITTING FINITE MIXTURE MODELS IN A REGRESSION CONTEXT

P.N. Jones G. J. McLachlan 《Australian & New Zealand Journal of Statistics》1992,34(2):233-240

Suppose data are collected in a three-mode fashion (individuals x items X attributes), and it is sought to cluster the individuals into groups on the basis of lineat relations between scores on the attributes for each item and auxiliary measurements made on the same items. A mixture model is pro posed and the EM algorithm is used to fit it to the data by simultaneously estimating the group parameters and allocating individuals to groups. The method is illustrated by a simulation study and a real example in which consumers are clustered on the basis of product scores that are related to a sensory laboratory measurement. 相似文献

10.

Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

Christophe Biernacki Julien Jacques 《Statistics and Computing》2016,26(5):929-943

We design a probability distribution for ordinal data by modeling the process generating data, which is assumed to rely only on order comparisons between categories. Contrariwise, most competitors often either forget the order information or add a non-existent distance information. The data generating process is assumed, from optimality arguments, to be a stochastic binary search algorithm in a sorted table. The resulting distribution is natively governed by two meaningful parameters (position and precision) and has very appealing properties: decrease around the mode, shape tuning from uniformity to a Dirac, identifiability. Moreover, it is easily estimated by an EM algorithm since the path in the stochastic binary search algorithm can be considered as missing values. Using then the classical latent class assumption, the previous univariate ordinal model is straightforwardly extended to model-based clustering for multivariate ordinal data. Parameters of this mixture model are estimated by an AECM algorithm. Both simulated and real data sets illustrate the great potential of this model by its ability to parsimoniously identify particularly relevant clusters which were unsuspected by some traditional competitors. 相似文献

11.

A new REML (parameter expanded) EM algorithm for linear mixed models

下载免费PDF全文

S. M. Diffey A. B. Smith A. H. Welsh B. R. Cullis 《Australian & New Zealand Journal of Statistics》2017,59(4):433-448

Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework. 相似文献

12.

A two‐step proximal‐point algorithm for the calculus of divergence‐based estimators in finite mixture models

Diaa Al Mohamad Michel Broniatowski 《Revue canadienne de statistique》2019,47(3):392-408

Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada 相似文献

13.

Likelihood ratio tests based on subglobal optimization: A power comparison in exponential mixture models 总被引：1，自引：1，他引：0

Wilfried Seidel Karl Mosler Manfred Alker 《Statistical Papers》2000,41(1):85-98

The paper compares several versions of the likelihood ratio test for exponential homogeneity against mixtures of two exponentials. They are based on different implementations of the likelihood maximization algorithm. We show that global maximization of the likelihood is not appropriate to obtain a good power of the LR test. A simple starting strategy for the EM algorithm, which under the null hypothesis often fails to find the global maximum, results in a rather powerful test. On the other hand, a multiple starting strategy that comes close to global maximization under both the null and the alternative hypotheses leads to inferior power. 相似文献

14.

Case-deletion Influence Measures for the Data from Multivariate t Distributions

Feng-Chang Xie Bo-Cheng Wei Jin-Guan Lin 《Journal of applied statistics》2007,34(8):907-921

For the data from multivariate t distributions, it is very hard to make an influence analysis based on the probability density function since its expression is intractable. In this paper, we present a technique for influence analysis based on the mixture distribution and EM algorithm. In fact, the multivariate t distribution can be considered as a particular Gaussian mixture by introducing the weights from the Gamma distribution. We treat the weights as the missing data and develop the influence analysis for the data from multivariate t distributions based on the conditional expectation of the complete-data log-likelihood function in the EM algorithm. Several case-deletion measures are proposed for detecting influential observations from multivariate t distributions. Two numerical examples are given to illustrate our methodology. 相似文献

15.

On the estimation of the extreme value and normal distribution parameters based on progressive type-II hybrid-censored data

《Journal of Statistical Computation and Simulation》2012,82(3):569-596

A progressive hybrid censoring scheme is a mixture of type-I and type-II progressive censoring schemes. In this paper, we mainly consider the analysis of progressive type-II hybrid-censored data when the lifetime distribution of the individual item is the normal and extreme value distributions. Since the maximum likelihood estimators (MLEs) of these parameters cannot be obtained in the closed form, we propose to use the expectation and maximization (EM) algorithm to compute the MLEs. Also, the Newton–Raphson method is used to estimate the model parameters. The asymptotic variance–covariance matrix of the MLEs under EM framework is obtained by Fisher information matrix using the missing information and asymptotic confidence intervals for the parameters are then constructed. This study will end up with comparing the two methods of estimation and the asymptotic confidence intervals of coverage probabilities corresponding to the missing information principle and the observed information matrix through a simulation study, illustrated examples and real data analysis. 相似文献

16.

The EM algorithm with gradient function update for discrete mixtures with known (fixed) number of components

Dankmar Böhning 《Statistics and Computing》2003,13(3):257-265

The paper is focussing on some recent developments in nonparametric mixture distributions. It discusses nonparametric maximum likelihood estimation of the mixing distribution and will emphasize gradient type results, especially in terms of global results and global convergence of algorithms such as vertex direction or vertex exchange method. However, the NPMLE (or the algorithms constructing it) provides also an estimate of the number of components of the mixing distribution which might be not desirable for theoretical reasons or might be not allowed from the physical interpretation of the mixture model. When the number of components is fixed in advance, the before mentioned algorithms can not be used and globally convergent algorithms do not exist up to now. Instead, the EM algorithm is often used to find maximum likelihood estimates. However, in this case multiple maxima are often occuring. An example from a meta-analyis of vitamin A and childhood mortality is used to illustrate the considerable, inferential importance of identifying the correct global likelihood. To improve the behavior of the EM algorithm we suggest a combination of gradient function steps and EM steps to achieve global convergence leading to the EM algorithm with gradient function update (EMGFU). This algorithms retains the number of components to be exactly k and typically converges to the global maximum. The behavior of the algorithm is highlighted at hand of several examples. 相似文献

17.

EM algorithm for mixture of skew-normal distributions fitted to grouped data

Mahdi Teimouri 《Journal of applied statistics》2021,48(7):1154

Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html. 相似文献

18.

Fitting a Semi-Parametric Mixture Model for Competing Risks in Survival Data

Gabriel Escarela Russell J. Bowater 《统计学通讯:理论与方法》2013,42(2):277-293

A model for survival analysis is studied that is relevant for samples which are subject to multiple types of failure. In comparison with a more standard approach, through the appropriate use of hazard functions and transition probabilities, the model allows for a more accurate study of cause-specific failure with regard to both the timing and type of failure. A semiparametric specification of a mixture model is employed that is able to adjust for concomitant variables and allows for the assessment of their effects on the probabilities of eventual causes of failure through a generalized logistic model, and their effects on the corresponding conditional hazard functions by employing the Cox proportional hazards model. A carefully formulated estimation procedure is presented that uses an EM algorithm based on a profile likelihood construction. The methods discussed, which could also be used for reliability analysis, are applied to a prostate cancer data set. 相似文献

19.

BL-GARCH models and asymmetries in volatility

Giuseppe Storti Cosimo Vitale 《Statistical Methods and Applications》2003,12(1):19-39

In this paper the class of Bilinear GARCH (BL-GARCH) models is proposed. BL-GARCH models allow to capture asymmetries in the conditional variance of financial and economic time series by means of interactions between past shocks and volatilities. The availability of likelihood based inference is an attractive feature of BL-GARCH models. Under the assumption of conditional normality, the log-likelihood function can be maximized by means of an EM type algorithm. The main reason for using the EM algorithm is that it allows to obtain parameter estimates which naturally guarantee the positive definiteness of the conditional variance with no need for additional parameter constraints. We also derive a robust LM test statistic which can be used for model identification. Finally, the effectiveness of BL-GARCH models in capturing asymmetric volatility patterns in financial time series is assessed by means of an application to a time series of daily returns on the NASDAQ Composite stock market index. 相似文献

20.

INFERENCE IN DIRICHLET PROCESS MIXED GENERALIZED LINEAR MODELS BY USING MONTE CARLO EM

Malay Naskar Kalyan Das 《Australian & New Zealand Journal of Statistics》2004,46(4):685-701

Generalized linear mixed models are widely used for describing overdispersed and correlated data. Such data arise frequently in studies involving clustered and hierarchical designs. A more flexible class of models has been developed here through the Dirichlet process mixture. An additional advantage of using such mixture models is that the observations can be grouped together on the basis of the overdispersion present in the data. This paper proposes a partial empirical Bayes method for estimating all the model parameters by adopting a version of the EM algorithm. An augmented model that helps to implement an efficient Gibbs sampling scheme, under the non‐conjugate Dirichlet process generalized linear model, generates observations from the conditional predictive distribution of unobserved random effects and provides an estimate of the average number of mixing components in the Dirichlet process mixture. A simulation study has been carried out to demonstrate the consistency of the proposed method. The approach is also applied to a study on outdoor bacteria concentration in the air and to data from 14 retrospective lung‐cancer studies. 相似文献