首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
When finite mixture models are used to fit data, it is sometimes important to estimate the number of mixture components. A nonparametric maximum-likelihood approach may result in too many support points and, in general, does not yield a consistent estimator. A penalized likelihood approach tends to produce a fit with fewer components, but it is not known whether that approach produces a consistent estimate of the number of mixture components. We suggest the use of a penalized minimum-distance method. It is shown that the estimator obtained is consistent for both the mixing distribution and the number of mixture components.  相似文献   

2.
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.  相似文献   

3.
In this work, the multinomial mixture model is studied, through a maximum likelihood approach. The convergence of the maximum likelihood estimator to a set with characteristics of interest is shown. A method to select the number of mixture components is developed based on the form of the maximum likelihood estimator. A simulation study is then carried out to verify its behavior. Finally, two applications on real data of multinomial mixtures are presented.  相似文献   

4.
The paper is focussing on some recent developments in nonparametric mixture distributions. It discusses nonparametric maximum likelihood estimation of the mixing distribution and will emphasize gradient type results, especially in terms of global results and global convergence of algorithms such as vertex direction or vertex exchange method. However, the NPMLE (or the algorithms constructing it) provides also an estimate of the number of components of the mixing distribution which might be not desirable for theoretical reasons or might be not allowed from the physical interpretation of the mixture model. When the number of components is fixed in advance, the before mentioned algorithms can not be used and globally convergent algorithms do not exist up to now. Instead, the EM algorithm is often used to find maximum likelihood estimates. However, in this case multiple maxima are often occuring. An example from a meta-analyis of vitamin A and childhood mortality is used to illustrate the considerable, inferential importance of identifying the correct global likelihood. To improve the behavior of the EM algorithm we suggest a combination of gradient function steps and EM steps to achieve global convergence leading to the EM algorithm with gradient function update (EMGFU). This algorithms retains the number of components to be exactly k and typically converges to the global maximum. The behavior of the algorithm is highlighted at hand of several examples.  相似文献   

5.
We consider ways to estimate the mixing proportions in a finite mixture distribution or to estimate the number of components of the mixture distribution without making parametric assumptions about the component distributions. We require a vector of observations on each subject. This vector is mapped into a vector of 0s and 1s and summed. The resulting distribution of sums can be modelled as a mixture of binomials. We then work with the binomial mixture. The efficiency and robustness of this method are compared with the strategy of assuming multivariate normal mixtures when, typically, the true underlying mixture distribution is different. It is shown that in many cases the approach based on simple binomial mixtures is superior.  相似文献   

6.
In some situations, the distribution of the error terms of a multivariate linear regression model may depart from normality. This problem has been addressed, for example, by specifying a different parametric distribution family for the error terms, such as multivariate skewed and/or heavy-tailed distributions. A new solution is proposed, which is obtained by modelling the error term distribution through a finite mixture of multi-dimensional Gaussian components. The multivariate linear regression model is studied under this assumption. Identifiability conditions are proved and maximum likelihood estimation of the model parameters is performed using the EM algorithm. The number of mixture components is chosen through model selection criteria; when this number is equal to one, the proposal results in the classical approach. The performances of the proposed approach are evaluated through Monte Carlo experiments and compared to the ones of other approaches. In conclusion, the results obtained from the analysis of a real dataset are presented.  相似文献   

7.
We will pursue a Bayesian nonparametric approach in the hierarchical mixture modelling of lifetime data in two situations: density estimation, when the distribution is a mixture of parametric densities with a nonparametric mixing measure, and accelerated failure time (AFT) regression modelling, when the same type of mixture is used for the distribution of the error term. The Dirichlet process is a popular choice for the mixing measure, yielding a Dirichlet process mixture model for the error; as an alternative, we also allow the mixing measure to be equal to a normalized inverse-Gaussian prior, built from normalized inverse-Gaussian finite dimensional distributions, as recently proposed in the literature. Markov chain Monte Carlo techniques will be used to estimate the predictive distribution of the survival time, along with the posterior distribution of the regression parameters. A comparison between the two models will be carried out on the grounds of their predictive power and their ability to identify the number of components in a given mixture density.  相似文献   

8.
When the unobservable Markov chain in a hidden Markov model is stationary the marginal distribution of the observations is a finite mixture with the number of terms equal to the number of the states of the Markov chain. This suggests the number of states of the unobservable Markov chain can be estimated by determining the number of mixture components in the marginal distribution. This paper presents new methods for estimating the number of states in a hidden Markov model, and coincidentally the unknown number of components in a finite mixture, based on penalized quasi‐likelihood and generalized quasi‐likelihood ratio methods constructed from the marginal distribution. The procedures advocated are simple to calculate, and results obtained in empirical applications indicate that they are as effective as current available methods based on the full likelihood. Under fairly general regularity conditions, the methods proposed generate strongly consistent estimates of the unknown number of states or components.  相似文献   

9.
This paper describes a Bayesian approach to mixture modelling and a method based on predictive distribution to determine the number of components in the mixtures. The implementation is done through the use of the Gibbs sampler. The method is described through the mixtures of normal and gamma distributions. Analysis is presented in one simulated and one real data example. The Bayesian results are then compared with the likelihood approach for the two examples.  相似文献   

10.
We define the mixture likelihood approach to clustering by discussing the sampling distribution of the likelihood ratio test of the null hypothesis that we have observed a sample of observations of a variable having the bivariate normal distribution versus the alternative that the variable has the bivariate normal mixture with unequal means and common within component covariance matrix. The empirical distribution of the likelihood ratio test indicates that convergence to the chi-squared distribution with 2 df is at best very slow, that the sample size should be 5000 or more for the chi-squared result to hold, and that for correlations between 0.1 and 0.9 there is little, if any, dependence of the null distribution on the correlation. Our simulation study suggests a heuristic function based on the gamma.  相似文献   

11.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

12.
We propose a new algorithm for computing the maximum likelihood estimate of a nonparametric survival function for interval-censored data, by extending the recently-proposed constrained Newton method in a hierarchical fashion. The new algorithm makes use of the fact that a mixture distribution can be recursively written as a mixture of mixtures, and takes a divide-and-conquer approach to break down a large-scale constrained optimization problem into many small-scale ones, which can be solved rapidly. During the course of optimization, the new algorithm, which we call the hierarchical constrained Newton method, can efficiently reallocate the probability mass, both locally and globally, among potential support intervals. Its convergence is theoretically established based on an equilibrium analysis. Numerical study results suggest that the new algorithm is the best choice for data sets of any size and for solutions with any number of support intervals.  相似文献   

13.
The paper reviews finite mixture models for binomial counts with concomitant variables. These models are well known in theory, but they are rarely applied. We use a binomial finite mixture to model the number of credits gained by freshmen during the first year at the School of Economics of the University of Florence. The finite mixture approach allows us to appropriately account for the large number of zeroes and the multimodality of the observed distribution. Moreover, we rely on a concomitant variable specification to investigate the role of student background characteristics and of a compulsory pre-enrollment test in predicting gained credits. In the paper, we deal with model selection, including the choice of the number of components, and we devise numerical and graphical summaries of the model results in order to exploit the information content of the concomitant variable specification. The main finding is that the introduction of the pre-enrollment test gives additional information for student tutoring, even if the predictive power is modest.  相似文献   

14.
The probabilistic uncertainty in record linkage affects statistical analysis such as regression analysis of linked data. This paper considers Bayesian regression analysis with linked data and shows that despite using the usual normal regression analysis, the least squares type estimators of regression coefficients are not always adequate. A method is proposed in which the distribution of the response variable is used. This method is related to finite mixture analysis and leads to more accurate estimations. A simple approach has been proposed to increase the tractability and reduce the number of mixture components. A Monte Carlo simulation study is also performed to assess the proposed approach.  相似文献   

15.
Summary.  Multiperiodic functions, or functions that can be represented as finite additive mixtures of periodic functions, arise in problems related to stellar radiation. There they represent the overall variation in radiation intensity with time. The individual periodic components generally correspond to different sources of radiation and have intrinsic physical meaning provided that they can be 'deconvolved' from the mixture. We suggest a combination of kernel and orthogonal series methods for performing the deconvolution, and we show how to estimate both the sequence of periods and the periodic functions themselves. We pay particular attention to the issue of identifiability, in a nonparametric sense, of the components. This aspect of the problem is shown to exhibit particularly unusual features, and to have connections to number theory. The matter of rates of convergence of estimators also has links there, although we show that the rate-of-convergence problem can be treated from a relatively conventional viewpoint by considering an appropriate prior distribution for the periods.  相似文献   

16.
We consider the use of an EM algorithm for fitting finite mixture models when mixture component size is known. This situation can occur in a number of settings, where individual membership is unknown but aggregate membership is known. When the mixture component size, i.e., the aggregate mixture component membership, is known, it is common practice to treat only the mixing probability as known. This approach does not, however, entirely account for the fact that the number of observations within each mixture component is known, which may result in artificially incorrect estimates of parameters. By fully capitalizing on the available information, the proposed EM algorithm shows robustness to the choice of starting values and exhibits numerically stable convergence properties.  相似文献   

17.
Effective component relabeling in Bayesian analyses of mixture models is critical to the routine use of mixtures in classification with analysis based on Markov chain Monte Carlo methods. The classification-based relabeling approach here is computationally attractive and statistically effective, and scales well with sample size and number of mixture components concordant with enabling routine analyses of increasingly large data sets. Building on the best of existing methods, practical relabeling aims to match data:component classification indicators in MCMC iterates with those of a defined reference mixture distribution. The method performs as well as or better than existing methods in small dimensional problems, while being practically superior in problems with larger data sets as the approach is scalable. We describe examples and computational benchmarks, and provide supporting code with efficient computational implementation of the algorithm that will be of use to others in practical applications of mixture models.  相似文献   

18.
New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution. The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context.  相似文献   

19.
We consider the test of the null hypothesis that the largest mean in a mixture of an unknown number of normal components is less than or equal to a given threshold. This test is motivated by the problem of assessing whether the Soviet Union has been operating in compliance with the Nuclear Test Ban Treaty. In our analysis, the number of normal components is determined using Akaike's Information Criterion while the hypothesis test itself is based on asymptotic results given by Behboodian for a mixture of two normal components. A bootstrap approach is also considered for estimating the standard error of the largest estimated mean. The performance of the testa are examined through the use of simulation.  相似文献   

20.
Finite mixture models arise in a natural way in that they are modeling unobserved population heterogeneity. It is assumed that the population consists of an unknown number k of subpopulations with parameters λ1, ..., λk receiving weights p1, ..., pk. Because of the irregularity of the parameter space, the log-likelihood-ratio statistic (LRS) does not have a (χ2) limit distribution and therefore it is difficult to use the LRS to test for the number of components. These problems are circumvented by using the nonparametric bootstrap such that the mixture algorithm is applied B times to bootstrap samples obtained from the original sample with replacement. The number of components k is obtained as the mode of the bootstrap distribution of k. This approach is presented using the Times newspaper data and investigated in a simulation study for mixtures of Poisson data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号