首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We consider ways to estimate the mixing proportions in a finite mixture distribution or to estimate the number of components of the mixture distribution without making parametric assumptions about the component distributions. We require a vector of observations on each subject. This vector is mapped into a vector of 0s and 1s and summed. The resulting distribution of sums can be modelled as a mixture of binomials. We then work with the binomial mixture. The efficiency and robustness of this method are compared with the strategy of assuming multivariate normal mixtures when, typically, the true underlying mixture distribution is different. It is shown that in many cases the approach based on simple binomial mixtures is superior.  相似文献   

2.
This paper describes a Bayesian approach to mixture modelling and a method based on predictive distribution to determine the number of components in the mixtures. The implementation is done through the use of the Gibbs sampler. The method is described through the mixtures of normal and gamma distributions. Analysis is presented in one simulated and one real data example. The Bayesian results are then compared with the likelihood approach for the two examples.  相似文献   

3.
In this work, the multinomial mixture model is studied, through a maximum likelihood approach. The convergence of the maximum likelihood estimator to a set with characteristics of interest is shown. A method to select the number of mixture components is developed based on the form of the maximum likelihood estimator. A simulation study is then carried out to verify its behavior. Finally, two applications on real data of multinomial mixtures are presented.  相似文献   

4.
New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution. The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context.  相似文献   

5.
Bivariate count data arise in several different disciplines (epidemiology, marketing, sports statistics just to name a few) and the bivariate Poisson distribution being a generalization of the Poisson distribution plays an important role in modelling such data. In the present paper we present a Bayesian estimation approach for the parameters of the bivariate Poisson model and provide the posterior distributions in closed forms. It is shown that the joint posterior distributions are finite mixtures of conditionally independent gamma distributions for which their full form can be easily deduced by a recursively updating scheme. Thus, the need of applying computationally demanding MCMC schemes for Bayesian inference in such models will be removed, since direct sampling from the posterior will become available, even in cases where the posterior distribution of functions of the parameters is not available in closed form. In addition, we define a class of prior distributions that possess an interesting conjugacy property which extends the typical notion of conjugacy, in the sense that both prior and posteriors belong to the same family of finite mixture models but with different number of components. Extension to certain other models including multivariate models or models with other marginal distributions are discussed.  相似文献   

6.
The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.  相似文献   

7.
In this work, we modify finite mixtures of factor analysers to provide a method for simultaneous clustering of subjects and multivariate discrete outcomes. The joint clustering is performed through a suitable reparameterization of the outcome (column)-specific parameters. We develop an expectation–maximization-type algorithm for maximum likelihood parameter estimation where the maximization step is divided into orthogonal sub-blocks that refer to row and column-specific parameters, respectively. Model performance is evaluated via a simulation study with varying sample size, number of outcomes and row/column-specific clustering (partitions). We compare the performance of our model with the performance of standard model-based biclustering approaches. The proposed method is also demonstrated on a benchmark data set where a multivariate binary response is considered.  相似文献   

8.
Improving the EM algorithm for mixtures   总被引:1,自引:0,他引:1  
One of the estimating equations of the Maximum Likelihood Estimation method, for finite mixtures of the one parameter exponential family, is the first moment equation. This can help considerably in reducing the labor and the cost of calculating the Maximum Likelihood estimates. In this paper it is shown that the EM algorithm can be substantially improved by using this result when applied for mixture models. A short discussion about other methods proposed for the calculation of the Maximum Likelihood estimates are also reported showing that the above findings can help in this direction too.  相似文献   

9.
Homoscedastic and heteroscedastic Gaussian mixtures differ in the constraints placed on the covariance matrices of the mixture components. A new mixture, called herein a strophoscedastic mixture, is defined by a new constraint, This constraint requires the matrices to be identical under orthogonal trans¬formations, where different transformations are allowed for different matrices. It is shown that the M-step of the EM method for estimating the parameters of strophoscedastic mixtures from sample data is explicitly solvable using singular value decompositions. Consequently, the EM-based maximum likelihood estimation algorithm is as easily implemented for strophoscedastic mixtures as it is for homoscedastic and heteroscedastic mixtures. An example of a “noisy” Archimedian spiral is presented.  相似文献   

10.
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.  相似文献   

11.
The EM algorithm is the standard method for estimating the parameters in finite mixture models. Yang and Pan [25] proposed a generalized classification maximum likelihood procedure, called the fuzzy c-directions (FCD) clustering algorithm, for estimating the parameters in mixtures of von Mises distributions. Two main drawbacks of the EM algorithm are its slow convergence and the dependence of the solution on the initial value used. The choice of initial values is of great importance in the algorithm-based literature as it can heavily influence the speed of convergence of the algorithm and its ability to locate the global maximum. On the other hand, the algorithmic frameworks of EM and FCD are closely related. Therefore, the drawbacks of FCD are the same as those of the EM algorithm. To resolve these problems, this paper proposes another clustering algorithm, which can self-organize local optimal cluster numbers without using cluster validity functions. These numerical results clearly indicate that the proposed algorithm is superior in performance of EM and FCD algorithms. Finally, we apply the proposed algorithm to two real data sets.  相似文献   

12.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

13.
The article extends the REBMIX to multivariate data. Random variables may follow normal, lognormal, or Weibull parametric families and should be independent within components. The initial weights and component parameters are not required. Preprocessing of observations folows the histogram, Parzen window, or k-nearest neighbor approach. The number of components, weights, and component parameters are gained iteratively by using information measures of the distance, such as the total of positive relative deviations and the information criterion. The number of classes or the number of the nearest neighbors can be optimized, as well. The REBMIX software is available on http://www.fs.uni-lj.si/lavek.  相似文献   

14.
The empirical Bayes (EB) method is commonly used by transportation safety analysts for conducting different types of safety analyses, such as before–after studies and hotspot analyses. To date, most implementations of the EB method have been applied using a negative binomial (NB) model, as it can easily accommodate the overdispersion commonly observed in crash data. Recent studies have shown that a generalized finite mixture of NB models with K mixture components (GFMNB-K) can also be used to model crash data subjected to overdispersion and generally offers better statistical performance than the traditional NB model. So far, nobody has developed how the EB method could be used with finite mixtures of NB models. The main objective of this study is therefore to use a GFMNB-K model in the calculation of EB estimates. Specifically, GFMNB-K models with varying weight parameters are developed to analyze crash data from Indiana and Texas. The main finding shows that the rankings produced by the NB and GFMNB-2 models for hotspot identification are often quite different, and this was especially noticeable with the Texas dataset. Finally, a simulation study designed to examine which model formulation can better identify the hotspot is recommended as our future research.  相似文献   

15.
The REBMIX algorithm is presented and applied to estimation of finite univariate mixture densities. The algorithm identifies the component parameters, mixing weights, and number of the components successively. Significant improvement is achieved by replacing the rigid restraints with the loose ones, which enables improved modelling of overlapped components. The algorithm is controlled by the extreme relative deviations, total of positive relative deviations, and information criteria. It enables also the modeling of multivariate finite mixtures. However, the article considers univariate normal, lognormal, and Weibull finite mixtures solely. The REBMIX software is available on http://www.fs.uni-lj.si/lavek.  相似文献   

16.
A necessary and sufficient condition that a continuous, positive random variable follow a gamma distribution is given in terms of any one of its conditional finite moments and an expression involving its failure rate. The results are then used to develop a characterization for a mixture of two gamma distributions. The general results about characterization of a mixture of gamma distributions yield several special cases that have appeared separately in recent literature, including characterization of a single exponential distribution, characterization of a single gamma distribution (in terms of either first or second moments) and a sufficient condition for a mixture of two exponential distributions (in terms of first moments). The condition in this last result is shown to be necessary also. Numerous other cases are possible, using different choices for distribution parameters along with a selection of the mixing parameter, for either individual or mixtures of distributions. Various characterizations can be expressed using higher order moments, too.  相似文献   

17.
We introduce a multivariate heteroscedastic measurement error model for replications under scale mixtures of normal distribution. The model can provide a robust analysis and can be viewed as a generalization of multiple linear regression from both model structure and distribution assumption. An efficient method based on Markov Chain Monte Carlo is developed for parameter estimation. The deviance information criterion and the conditional predictive ordinates are used as model selection criteria. Simulation studies show robust inference behaviours of the model against both misspecification of distributions and outliers. We work out an illustrative example with a real data set on measurements of plant root decomposition.  相似文献   

18.
Acceleration of the EM Algorithm by using Quasi-Newton Methods   总被引:1,自引:0,他引:1  
The EM algorithm is a popular method for maximum likelihood estimation. Its simplicity in many applications and desirable convergence properties make it very attractive. Its sometimes slow convergence, however, has prompted researchers to propose methods to accelerate it. We review these methods, classifying them into three groups: pure , hybrid and EM-type accelerators. We propose a new pure and a new hybrid accelerator both based on quasi-Newton methods and numerically compare these and two other quasi-Newton accelerators. For this we use examples in each of three areas: Poisson mixtures, the estimation of covariance from incomplete data and multivariate normal mixtures. In these comparisons, the new hybrid accelerator was fastest on most of the examples and often dramatically so. In some cases it accelerated the EM algorithm by factors of over 100. The new pure accelerator is very simple to implement and competed well with the other accelerators. It accelerated the EM algorithm in some cases by factors of over 50. To obtain standard errors, we propose to approximate the inverse of the observed information matrix by using auxiliary output from the new hybrid accelerator. A numerical evaluation of these approximations indicates that they may be useful at least for exploratory purposes.  相似文献   

19.
This paper uses various gauges to construct principal variables that satisfy criteria of maximal scatter. The solutions coincide with Hotelling's (1933) principal components in structured ensembles and mixtures, including heavy-tailed distributions not having moments. Thus, normal-theory tests are exact in level and power under nonstandard models allowing for correlated vector observations and for certain mixtures having neither moments nor unimodal marginals.  相似文献   

20.
We introduce two new families of univariate distributions that we call hyperminimal and hypermaximal distributions. These families have interesting applications in the context of reliability theory in that they contain that of coherent system lifetime distributions. For these families, we obtain distributions, bounds, and moments. We also define the minimal and maximal signatures of a coherent system with exchangeable components which allow us to represent the system distribution as generalized mixtures (i.e., mixtures with possibly negative weights) of series and parallel systems. These results can also be applied to order statistics (k-out-of-n systems). Finally, we give some applications studying coherent systems with different multivariate exponential joint distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号