期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring 总被引：1，自引：0，他引：1

Tao Chen Julian Morris Elaine Martin 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(5):699-715

Summary. The primary goal of multivariate statistical process performance monitoring is to identify deviations from normal operation within a manufacturing process. The basis of the monitoring schemes is historical data that have been collected when the process is running under normal operating conditions. These data are then used to establish confidence bounds to detect the onset of process deviations. In contrast with the traditional approaches that are based on the Gaussian assumption, this paper proposes the application of the infinite Gaussian mixture model (GMM) for the calculation of the confidence bounds, thereby relaxing the previous restrictive assumption. The infinite GMM is a special case of Dirichlet process mixtures and is introduced as the limit of the finite GMM, i.e. when the number of mixtures tends to ∞. On the basis of the estimation of the probability density function, via the infinite GMM, the confidence bounds are calculated by using the bootstrap algorithm. The methodology proposed is demonstrated through its application to a simulated continuous chemical process, and a batch semiconductor manufacturing process. 相似文献

2.

Gaussian mixture analysis of covariance

《Journal of Statistical Computation and Simulation》2012,82(16):3158-3174

ABSTRACT

In many real-world applications, the traditional theory of analysis of covariance (ANCOVA) leads to inadequate and unreliable results because of violation of the response variable observations from the essential Gaussian assumption that may be due to the heterogeneity of population, the presence of outlier or both of them. In this paper, we develop a Gaussian mixture ANCOVA model for modelling heterogeneous populations with a finite number of subpopulation. We provide the maximum likelihood estimates of the model parameters via an EM algorithm. We also drive the adjusted effects estimators for treatments and covariates. The Fisher information matrix of the model and asymptotic confidence intervals for the parameter are also discussed. We performed a simulation study to assess the performance of the proposed model. A real-world example is also worked out to explained the methodology. 相似文献

3.

Adaptive multiple importance sampling for Gaussian processes

Xiaoyu Xiong Václav Šmídl Maurizio Filippone 《Journal of Statistical Computation and Simulation》2017,87(8):1644-1665

In applications of Gaussian processes (GPs) where quantification of uncertainty is a strict requirement, it is necessary to accurately characterize the posterior distribution over Gaussian process covariance parameters. This is normally done by means of standard Markov chain Monte Carlo (MCMC) algorithms, which require repeated expensive calculations involving the marginal likelihood. Motivated by the desire to avoid the inefficiencies of MCMC algorithms rejecting a considerable amount of expensive proposals, this paper develops an alternative inference framework based on adaptive multiple importance sampling (AMIS). In particular, this paper studies the application of AMIS for GPs in the case of a Gaussian likelihood, and proposes a novel pseudo-marginal-based AMIS algorithm for non-Gaussian likelihoods, where the marginal likelihood is unbiasedly estimated. The results suggest that the proposed framework outperforms MCMC-based inference of covariance parameters in a wide range of scenarios. 相似文献

4.

An extension of parametric ROC analysis for calculating diagnostic accuracy when underlying distributions are mixture of Gaussian

Karimollah Hajian-Tilaki James A. Hanley Vahid Nassiri 《Journal of applied statistics》2011,38(9):2009-2022

The semiparametric LABROC approach of fitting binormal model for estimating AUC as a global index of accuracy has been justified (except for bimodal forms), while for estimating a local index of accuracy such as TPF, it may lead to a bias in severe departure of data from binormality. We extended parametric ROC analysis for quantitative data when one or both pair members are mixture of Gaussian (MG) in particular for bimodal forms. We analytically showed that AUC and TPF are a mixture of weighting parameters of different components of AUCs and TPFs of a mixture of underlying distributions. In a simulation study of six configurations of MG distributions:{bimodal, normal} and {bimodal, bimodal} pairs, the parameters of MG distributions were estimated using the EM algorithm. The results showed that the estimated AUC from our proposed model was essentially unbiased, and that the bias in the estimated TPF at a clinically relevant range of FPF was roughly 0.01 for a sample size of n=100/100. In practice, with severe departures from binormality, we recommend an extension of the LABROC and software development for future research to allow for each member of the pair of distributions to be a mixture of Gaussian that is a more flexible parametric form. 相似文献

5.

Model-based clustering of Gaussian copulas for mixed data 总被引：1，自引：0，他引：1

Matthieu Marbac Christophe Biernacki Vincent Vandewalle 《统计学通讯:理论与方法》2017,46(23):11635-11656

Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features. 相似文献

6.

Inverse Gaussian Distribution for Modeling Conditional Durations in Finance

N. Balakrishna T. Rahul 《统计学通讯:模拟与计算》2013,42(3):476-486

The durations between market activities such as trades and quotes provide useful information on the underlying assets while analyzing financial time series. In this article, we propose a stochastic conditional duration model based on the inverse Gaussian distribution. The non-monotonic nature of the failure rate of the inverse Gaussian distribution makes it suitable for modeling the durations in financial time series. The parameters of the proposed model are estimated by an efficient importance sampling method. A simulation experiment is conducted to check the performance of the estimators. These estimates are used to compute estimated hazard functions and to compare with the empirical hazard functions. Finally, a real data analysis is provided to illustrate the practical utility of the models. 相似文献

7.

Parsimonious Gaussian mixture models 总被引：3，自引：0，他引：3

Paul David McNicholas Thomas Brendan Murphy 《Statistics and Computing》2008,18(3):285-296

Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed. These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance. 相似文献

8.

Gaussian Scale Mixture Models for Robust Linear Multivariate Regression with Missing Data

Juha Ala-Luhtala Robert Piché 《统计学通讯:模拟与计算》2016,45(3):791-813

We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature. 相似文献

9.

An empirical comparison of EM,SEM and MCMC performance for problematic Gaussian mixture likelihoods 总被引：2，自引：0，他引：2

Dias José G. Wedel Michel 《Statistics and Computing》2004,14(4):323-332

We compare EM, SEM, and MCMC algorithms to estimate the parameters of the Gaussian mixture model. We focus on problems in estimation arising from the likelihood function having a sharp ridge or saddle points. We use both synthetic and empirical data with those features. The comparison includes Bayesian approaches with different prior specifications and various procedures to deal with label switching. Although the solutions provided by these stochastic algorithms are more often degenerate, we conclude that SEM and MCMC may display faster convergence and improve the ability to locate the global maximum of the likelihood function. 相似文献

10.

A study of the properties of Gaussian mixture model for stable isotope standard quantification in MALDI-TOF MS

John Christian G. Spainhour Michael G. Janech Viswanathan Ramakrishnan 《统计学通讯:模拟与计算》2019,48(6):1637-1650

The quantification of peptides in Matrix assisted laser desorption/ionization time-of-flight mass spectrum analysis coupled with stable isotope standards has been used to quantify native peptides under many experimental conditions. This approach has difficulties quantifying samples containing peptides with ion currents in overlapping (convolved) spectra. In a previous article we proposed a reparametrized Gaussian mixture model based on the known characteristics of the peptides that could also accommodate overlapping spectra. We demonstrated the application of our model in a series of single and overlapping peptides quantification experiments. Here, we focus solely on studying the properties of our approach and examine the characteristics of the GMM approach in convolved peptides using simulated spectra and provide a method for simulating these spectra. 相似文献

11.

Non-parametric estimation of the long-range dependence exponent for Gaussian processes

Gabriel Lang Jean-Marc Azaïs 《Journal of statistical planning and inference》1999,80(1-2)

We consider a class of long-range-dependent Gaussian processes defined in a semiparametric framework. We propose a new estimator of the long-range dependence parameter, based on the integration of the periodogram in two windows. We show that it is asymptotically Gaussian and calculate the rate of convergence. We optimise parameters defining the window function for the minimum mean-square-error criterion. In a Monte-Carlo study, we compare the proposed estimator with previously studied estimators. 相似文献

12.

Using conditional independence for parsimonious model-based Gaussian clustering

Giuliano Galimberti Gabriele Soffritti 《Statistics and Computing》2013,23(5):625-638

In the framework of model-based cluster analysis, finite mixtures of Gaussian components represent an important class of statistical models widely employed for dealing with quantitative variables. Within this class, we propose novel models in which constraints on the component-specific variance matrices allow us to define Gaussian parsimonious clustering models. Specifically, the proposed models are obtained by assuming that the variables can be partitioned into groups resulting to be conditionally independent within components, thus producing component-specific variance matrices with a block diagonal structure. This approach allows us to extend the methods for model-based cluster analysis and to make them more flexible and versatile. In this paper, Gaussian mixture models are studied under the above mentioned assumption. Identifiability conditions are proved and the model parameters are estimated through the maximum likelihood method by using the Expectation-Maximization algorithm. The Bayesian information criterion is proposed for selecting the partition of the variables into conditionally independent groups. The consistency of the use of this criterion is proved under regularity conditions. In order to examine and compare models with different partitions of the set of variables a hierarchical algorithm is suggested. A wide class of parsimonious Gaussian models is also presented by parameterizing the component-variance matrices according to their spectral decomposition. The effectiveness and usefulness of the proposed methodology are illustrated with two examples based on real datasets. 相似文献

13.

Goodness of fit for the inverse Gaussian distribution

Federico J. O'Reilly RaÚL Rueda 《Revue canadienne de statistique》1992,20(4):387-397

For testing the fit of the inverse Gaussian distribution with unknown parameters, the empirical distribution-function statistic A² is studied. Two procedures are followed in constructing the test statistic; they yield the same asymptotic distribution. In the first procedure the parameters in the distribution function are directly estimated, and in the second the distribution function is estimated by its Rao-Blackwell distribution estimator. A table is given for the asymptotic critical points of A². These are shown to depend only on the ratio of the unknown parameters. An analysis is provided of the effect of estimating the ratio to enter the table for A². This analysis enables the proposal of the complete operating procedure, which is sustained by a Monte Carlo study. 相似文献

14.

INCOMPLETE DATA IN GENERALIZED LINEAR MODELS WITH CONTINUOUS COVARIATES

Joseph G. Brahim Sanford Weisberg 《Australian & New Zealand Journal of Statistics》1992,34(3):461-470

This paper proposes a method for estimating the parameters in a generalized linear model with missing covariates. The missing covariates are assumed to come from a continuous distribution, and are assumed to be missing at random. In particular, Gaussian quadrature methods are used on the E-step of the EM algorithm, leading to an approximate EM algorithm. The parameters are then estimated using the weighted EM procedure given in Ibrahim (1990). This approximate EM procedure leads to approximate maximum likelihood estimates, whose standard errors and asymptotic properties are given. The proposed procedure is illustrated on a data set. 相似文献

15.

Hierarchical Gaussian process mixtures for regression

J.Q.?Shi Email author R.?Murray-Smith D.M.?Titterington 《Statistics and Computing》2005,15(1):31-41

As a result of their good performance in practice and their desirable analytical properties, Gaussian process regression models are becoming increasingly of interest in statistics, engineering and other fields. However, two major problems arise when the model is applied to a large data-set with repeated measurements. One stems from the systematic heterogeneity among the different replications, and the other is the requirement to invert a covariance matrix which is involved in the implementation of the model. The dimension of this matrix equals the sample size of the training data-set. In this paper, a Gaussian process mixture model for regression is proposed for dealing with the above two problems, and a hybrid Markov chain Monte Carlo (MCMC) algorithm is used for its implementation. Application to a real data-set is reported. 相似文献

16.

Identifiable finite mixtures of location models for clustering mixed-mode data

Willse Alan Boik Robert J. 《Statistics and Computing》1999,9(2):111-121

For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m–1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable. 相似文献

17.

Bayesian deconvolution of oil well test data using Gaussian processes

J. Andrés Christen Bruno Sansó Mario Santana-Cibrian Jorge X. Velasco-Hernández 《Journal of applied statistics》2016,43(4):721-737

We use Bayesian methods to infer an unobserved function that is convolved with a known kernel. Our method is based on the assumption that the function of interest is a Gaussian process and, assuming a particular correlation structure, the resulting convolution is also a Gaussian process. This fact is used to obtain inferences regarding the unobserved process, effectively providing a deconvolution method. We apply the methodology to the problem of estimating the parameters of an oil reservoir from well-test pressure data. Here, the unknown process describes the structure of the well. Applications to data from Mexican oil wells show very accurate results. 相似文献

18.

Poisson-mixed Inverse Gaussian Regression Model and Its Application

Emilio Gómez-Déniz Ramesh C. Gupta 《统计学通讯:模拟与计算》2016,45(8):2767-2781

In this article, we have developed a Poisson-mixed inverse Gaussian (PMIG) distribution. The mixed inverse Gaussian distribution is a mixture of the inverse Gaussian distribution and its length-biased counterpart. A PMIG regression model is developed and the maximum likelihood estimation of the parameters is studied. A dataset dealing with the number of hospital stays among the elderly population is analyzed by using the PMIG and the PIG (Poisson-inverse Gaussian) regression models and it has been shown that the PMIG model fits the data better than the PIG model. 相似文献

19.

Initializing EM using the properties of its trajectories in Gaussian mixtures 总被引：4，自引：1，他引：3

Christophe Biernacki 《Statistics and Computing》2004,14(3):267-279

A strategy is proposed to initialize the EM algorithm in the multivariate Gaussian mixture context. It consists in randomly drawing, with a low computational cost in many situations, initial mixture parameters in an appropriate space including all possible EM trajectories. This space is simply defined by two relations between the two first empirical moments and the mixture parameters satisfied by any EM iteration. An experimental study on simulated and real data sets clearly shows that this strategy outperforms classical methods, since it has the nice property to widely explore local maxima of the likelihood function. 相似文献

20.

Mixtures of Gaussian copula factor analyzers for clustering high dimensional data

《Journal of the Korean Statistical Society》2019,48(3):480-492

Mixtures of factor analyzers is a useful model-based clustering method which can avoid the curse of dimensionality in high-dimensional clustering. However, this approach is sensitive to both diverse non-normalities of marginal variables and outliers, which are commonly observed in multivariate experiments. We propose mixtures of Gaussian copula factor analyzers (MGCFA) for clustering high-dimensional clustering. This model has two advantages; (1) it allows different marginal distributions to facilitate fitting flexibility of the mixture model, (2) it can avoid the curse of dimensionality by embedding the factor-analytic structure in the component-correlation matrices of the mixture distribution.An EM algorithm is developed for the fitting of MGCFA. The proposed method is free of the curse of dimensionality and allows any parametric marginal distribution which fits best to the data. It is applied to both synthetic data and a microarray gene expression data for clustering and shows its better performance over several existing methods. 相似文献