首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Parsimonious Gaussian mixture models   总被引:3,自引:0,他引:3  
Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed. These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.  相似文献   

2.
Summary.  An authentic food is one that is what it purports to be. Food processors and consumers need to be assured that, when they pay for a specific product or ingredient, they are receiving exactly what they pay for. Classification methods are an important tool in food authenticity studies where they are used to assign food samples of unknown type to known types. A classification method is developed where the classification rule is estimated by using both the labelled and the unlabelled data, in contrast with many classical methods which use only the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in model-based clustering. A missing data formulation of the mixture model is used and the models are fitted by using the EM and classification EM algorithms. The methods are applied to the analysis of spectra of food-stuffs recorded over the visible and near infra-red wavelength range in food authenticity studies. A comparison of the performance of model-based discriminant analysis and the method of classification proposed is given. The classification method proposed is shown to yield very good misclassification rates. The correct classification rate was observed to be as much as 15% higher than the correct classification rate for model-based discriminant analysis.  相似文献   

3.
In this study, a new per-field classification method is proposed for supervised classification of remotely sensed multispectral image data of an agricultural area using Gaussian mixture discriminant analysis (MDA). For the proposed per-field classification method, multivariate Gaussian mixture models constructed for control and test fields can have fixed or different number of components and each component can have different or common covariance matrix structure. The discrimination function and the decision rule of this method are established according to the average Bhattacharyya distance and the minimum values of the average Bhattacharyya distances, respectively. The proposed per-field classification method is analyzed for different structures of a covariance matrix with fixed and different number of components. Also, we classify the remotely sensed multispectral image data using the per-pixel classification method based on Gaussian MDA.  相似文献   

4.
Model-based clustering of Gaussian copulas for mixed data   总被引:1,自引:0,他引:1  
Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.  相似文献   

5.
We propose a mixture of latent variables model for the model-based clustering, classification, and discriminant analysis of data comprising variables with mixed type. This approach is a generalization of latent variable analysis, and model fitting is carried out within the expectation-maximization framework. Our approach is outlined and a simulation study conducted to illustrate the effect of sample size and noise on the standard errors and the recovery probabilities for the number of groups. Our modelling methodology is then applied to two real data sets and their clustering and classification performance is discussed. We conclude with discussion and suggestions for future work.  相似文献   

6.
Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities.  相似文献   

7.
Model-based clustering is a flexible grouping technique based on fitting finite mixture models to data groups. Despite its rapid development in recent years, there is rather limited literature devoted to developing diagnostic tools for obtained clustering solutions. In this paper, a new method through fuzzy variation decomposition is proposed for probabilistic assessing contribution of variables to a detected dataset partition. Correlation between-variable contributions reveals the underlying variable interaction structure. A visualization tool illustrates whether two variables work collaboratively or exclusively in the model. Elimination of negative-effect variables in the partition leads to better classification results. The developed technique is employed on real-life datasets with promising results.  相似文献   

8.
Mixtures of factor analyzers is a useful model-based clustering method which can avoid the curse of dimensionality in high-dimensional clustering. However, this approach is sensitive to both diverse non-normalities of marginal variables and outliers, which are commonly observed in multivariate experiments. We propose mixtures of Gaussian copula factor analyzers (MGCFA) for clustering high-dimensional clustering. This model has two advantages; (1) it allows different marginal distributions to facilitate fitting flexibility of the mixture model, (2) it can avoid the curse of dimensionality by embedding the factor-analytic structure in the component-correlation matrices of the mixture distribution.An EM algorithm is developed for the fitting of MGCFA. The proposed method is free of the curse of dimensionality and allows any parametric marginal distribution which fits best to the data. It is applied to both synthetic data and a microarray gene expression data for clustering and shows its better performance over several existing methods.  相似文献   

9.
Summary.  Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models , where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.  相似文献   

10.
Effectively solving the label switching problem is critical for both Bayesian and Frequentist mixture model analyses. In this article, a new relabeling method is proposed by extending a recently developed modal clustering algorithm. First, the posterior distribution is estimated by a kernel density from permuted MCMC or bootstrap samples of parameters. Second, a modal EM algorithm is used to find the m! symmetric modes of the KDE. Finally, samples that ascend to the same mode are assigned the same label. Simulations and real data applications demonstrate that the new method provides more accurate estimates than many existing relabeling methods.  相似文献   

11.
Summary.  The cure fraction (the proportion of patients who are cured of disease) is of interest to both patients and clinicians and is a useful measure to monitor trends in survival of curable disease. The paper extends the non-mixture and mixture cure fraction models to estimate the proportion cured of disease in population-based cancer studies by incorporating a finite mixture of two Weibull distributions to provide more flexibility in the shape of the estimated relative survival or excess mortality functions. The methods are illustrated by using public use data from England and Wales on survival following diagnosis of cancer of the colon where interest lies in differences between age and deprivation groups. We show that the finite mixture approach leads to improved model fit and estimates of the cure fraction that are closer to the empirical estimates. This is particularly so in the oldest age group where the cure fraction is notably lower. The cure fraction is broadly similar in each deprivation group, but the median survival of the 'uncured' is lower in the more deprived groups. The finite mixture approach overcomes some of the limitations of the more simplistic cure models and has the potential to model the complex excess hazard functions that are seen in real data.  相似文献   

12.
13.
Independent factor analysis (IFA) has recently been proposed in the signal processing literature as a way to model a set of observed variables through linear combinations of latent independent variables and a noise term. A peculiarity of the method is that it defines a probability density function for the latent variables by mixtures of Gaussians. The aim of this paper is to cast the method into a more rigorous statistical framework and to propose some developments. In the first part, we present the IFA model in its population version, address identifiability issues and draw some parallels between the IFA model and the ordinary factor analysis (FA) one. Then we show that the IFA model may be reinterpreted as an independent component analysis-based rotation of an ordinary FA solution. We also give evidence that the IFA model represents a special case of mixture of factor analysers. In the second part, we address inferential issues, also deriving the standard errors for the model parameter estimates and providing model selection criteria. Finally, we present some empirical results on real data sets.  相似文献   

14.
Particle filters for mixture models with an unknown number of components   总被引:1,自引:1,他引:1  
We consider the analysis of data under mixture models where the number of components in the mixture is unknown. We concentrate on mixture Dirichlet process models, and in particular we consider such models under conjugate priors. This conjugacy enables us to integrate out many of the parameters in the model, and to discretize the posterior distribution. Particle filters are particularly well suited to such discrete problems, and we propose the use of the particle filter of Fearnhead and Clifford for this problem. The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu. In many situations it outperforms a Gibbs Sampler. We also show how models without the required amount of conjugacy can be efficiently analyzed by the same particle filter algorithm.  相似文献   

15.
We construct a mixture distribution including infant, exogenous and Gompertzian/non-Gompertzian senescent mortality. Using mortality data from Swedish females 1751–, we show that this outperforms models without these features, and compare its trends in cohort and period mortality over time. We find an almost complete disappearance of exogenous mortality within the last century of period mortality, with cohort mortality approaching the same limits. Both Gompertzian and non-Gompertzian senescent mortality are consistently present, with the estimated balance between them oscillating constantly. While the parameters of the latter appear to be trending over time, the parameters of the former do not.  相似文献   

16.
Summary.  In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. 'damaged' areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k -levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to ∞ and their connection with Dirichlet process mixtures.  相似文献   

17.
The need to use rigorous, transparent, clearly interpretable, and scientifically justified methodology for preventing and dealing with missing data in clinical trials has been a focus of much attention from regulators, practitioners, and academicians over the past years. New guidelines and recommendations emphasize the importance of minimizing the amount of missing data and carefully selecting primary analysis methods on the basis of assumptions regarding the missingness mechanism suitable for the study at hand, as well as the need to stress‐test the results of the primary analysis under different sets of assumptions through a range of sensitivity analyses. Some methods that could be effectively used for dealing with missing data have not yet gained widespread usage, partly because of their underlying complexity and partly because of lack of relatively easy approaches to their implementation. In this paper, we explore several strategies for missing data on the basis of pattern mixture models that embody clear and realistic clinical assumptions. Pattern mixture models provide a statistically reasonable yet transparent framework for translating clinical assumptions into statistical analyses. Implementation details for some specific strategies are provided in an Appendix (available online as Supporting Information), whereas the general principles of the approach discussed in this paper can be used to implement various other analyses with different sets of assumptions regarding missing data. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
We compare EM, SEM, and MCMC algorithms to estimate the parameters of the Gaussian mixture model. We focus on problems in estimation arising from the likelihood function having a sharp ridge or saddle points. We use both synthetic and empirical data with those features. The comparison includes Bayesian approaches with different prior specifications and various procedures to deal with label switching. Although the solutions provided by these stochastic algorithms are more often degenerate, we conclude that SEM and MCMC may display faster convergence and improve the ability to locate the global maximum of the likelihood function.  相似文献   

19.
Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

20.
In this article, we apply the Bayesian approach to the linear mixed effect models with autoregressive(p) random errors under mixture priors obtained with the Markov chain Monte Carlo (MCMC) method. The mixture structure of a point mass and continuous distribution can help to select the variables in fixed and random effects models from the posterior sample generated using the MCMC method. Bayesian prediction of future observations is also one of the major concerns. To get the best model, we consider the commonly used highest posterior probability model and the median posterior probability model. As a result, both criteria tend to be needed to choose the best model from the entire simulation study. In terms of predictive accuracy, a real example confirms that the proposed method provides accurate results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号