We consider the analysis of data under mixture models where the number of components in the mixture is unknown. We concentrate on mixture Dirichlet process models, and in particular we consider such models under conjugate priors. This conjugacy enables us to integrate out many of the parameters in the model, and to discretize the posterior distribution. Particle filters are particularly well suited to such discrete problems, and we propose the use of the particle filter of Fearnhead and Clifford for this problem. The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu. In many situations it outperforms a Gibbs Sampler. We also show how models without the required amount of conjugacy can be efficiently analyzed by the same particle filter algorithm.  相似文献   

A procedure is described which detects the number of components in a mixture distribution having normal components. The test is based on the behavior of the sample order statistics near the center of the distribution. Numerical results are presented and comparisons with tests proposed by Shapiro and Francia (1972) and Baker (1958) are provided.  相似文献   

This paper describes a Bayesian approach to mixture modelling and a method based on predictive distribution to determine the number of components in the mixtures. The implementation is done through the use of the Gibbs sampler. The method is described through the mixtures of normal and gamma distributions. Analysis is presented in one simulated and one real data example. The Bayesian results are then compared with the likelihood approach for the two examples.  相似文献   

A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that it can be used, with minimal changes, for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Artificial and real data sets are used to illustrate the method and mixtures of univariate and of multivariate normals are explicitly considered. The problem of label switching, when parameter inference is of interest, is addressed in a post-processing stage.  相似文献   

Very often, the likelihoods for circular data sets are of quite complicated forms, and the functional forms of the normalising constants, which depend upon the unknown parameters, are unknown. This latter problem generally precludes rigorous, exact inference (both classical and Bayesian) for circular data.Noting the paucity of literature on Bayesian circular data analysis, and also because realistic data analysis is naturally permitted by the Bayesian paradigm, we address the above problem taking a Bayesian perspective. In particular, we propose a methodology that combines importance sampling and Markov chain Monte Carlo (MCMC) in a very effective manner to sample from the posterior distribution of the parameters, given the circular data. With simulation study and real data analysis, we demonstrate the considerable reliability and flexibility of our proposed methodology in analysing circular data.  相似文献   

This article considers misclassification of categorical covariates in the context of regression analysis; if unaccounted for, such errors usually result in mis-estimation of model parameters. With the presence of additional covariates, we exploit the fact that explicitly modelling non-differential misclassification with respect to the response leads to a mixture regression representation. Under the framework of mixture of experts, we enable the reclassification probabilities to vary with other covariates, a situation commonly caused by misclassification that is differential on certain covariates and/or by dependence between the misclassified and additional covariates. Using Bayesian inference, the mixture approach combines learning from data with external information on the magnitude of errors when it is available. In addition to proving the theoretical identifiability of the mixture of experts approach, we study the amount of efficiency loss resulting from covariate misclassification and the usefulness of external information in mitigating such loss. The method is applied to adjust for misclassification on self-reported cocaine use in the Longitudinal Studies of HIV-Associated Lung Infections and Complications.  相似文献   

The adaptive rejection sampling (ARS) algorithm is a universal random generator for drawing samples efficiently from a univariate log-concave target probability density function (pdf). ARS generates independent samples from the target via rejection sampling with high acceptance rates. Indeed, ARS yields a sequence of proposal functions that converge toward the target pdf, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computational demanding each time it is updated. In this work, we propose a novel ARS scheme, called Cheap Adaptive Rejection Sampling (CARS), where the computational effort for drawing from the proposal remains constant, decided in advance by the user. For generating a large number of desired samples, CARS is faster than ARS.  相似文献   

In this paper the issue of making inferences with misclassified data from a noisy multinomial process is addressed. A Bayesian model for making inferences about the proportions and the noise parameters is developed. The problem is reformulated in a more tractable form by introducing auxiliary or latent random vectors. This allows for an easy-to-implement Gibbs sampling-based algorithm to generate samples from the distributions of interest. An illustrative example related to elections is also presented.  相似文献   

Multivariate mixtures of normals with unknown number of components   总被引:2,自引:0,他引:2  
We present full Bayesian analysis of finite mixtures of multivariate normals with unknown number of components. We adopt reversible jump Markov chain Monte Carlo and we construct, in a manner similar to that of Richardson and Green (1997), split and merge moves that produce good mixing of the Markov chains. The split moves are constructed on the space of eigenvectors and eigenvalues of the current covariance matrix so that the proposed covariance matrices are positive definite. Our proposed methodology has applications in classification and discrimination as well as heterogeneity modelling. We test our algorithm with real and simulated data.  相似文献   

Summary.  Road safety has recently become a major concern in most modern societies. The identification of sites that are more dangerous than others (black spots) can help in better scheduling road safety policies. This paper proposes a methodology for ranking sites according to their level of hazard. The model is innovative in at least two respects. Firstly, it makes use of all relevant information per accident location, including the total number of accidents and the number of fatalities, as well as the number of slight and serious injuries. Secondly, the model includes the use of a cost function to rank the sites with respect to their total expected cost to society. Bayesian estimation for the model via a Markov chain Monte Carlo approach is proposed. Accident data from 519 intersections in Leuven (Belgium) are used to illustrate the methodology proposed. Furthermore, different cost functions are used to show the effect of the proposed method on the use of different costs per type of injury.  相似文献   

We investigate the issue of bandwidth estimation in a functional nonparametric regression model with function-valued, continuous real-valued and discrete-valued regressors under the framework of unknown error density. Extending from the recent work of Shang (2013 Shang, H.L. (2013), ‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185198. doi: 10.1016/j.csda.2013.05.006[Crossref], [Web of Science ®] [Google Scholar]) [‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185–198], we approximate the unknown error density by a kernel density estimator of residuals, where the regression function is estimated by the functional Nadaraya–Watson estimator that admits mixed types of regressors. We derive a likelihood and posterior density for the bandwidth parameters under the kernel-form error density, and put forward a Bayesian bandwidth estimation approach that can simultaneously estimate the bandwidths. Simulation studies demonstrated the estimation accuracy of the regression function and error density for the proposed Bayesian approach. Illustrated by a spectroscopy data set in the food quality control, we applied the proposed Bayesian approach to select the optimal bandwidths in a functional nonparametric regression model with mixed types of regressors.  相似文献   

Hidden Markov models (HMMs) have been shown to be a flexible tool for modelling complex biological processes. However, choosing the number of hidden states remains an open question and the inclusion of random effects also deserves more research, as it is a recent addition to the fixed-effect HMM in many application fields. We present a Bayesian mixed HMM with an unknown number of hidden states and fixed covariates. The model is fitted using reversible-jump Markov chain Monte Carlo, avoiding the need to select the number of hidden states. We show through simulations that the estimations produced are more precise than those from a fixed-effect HMM and illustrate its practical application to the analysis of DNA copy number data, a field where HMMs are widely used.  相似文献   


Motivated by a longitudinal oral health study, the Signal-Tandmobiel® study, a Bayesian approach has been developed to model misclassified ordinal response data. Two regression models have been considered to incorporate misclassification in the categorical response. Specifically, probit and logit models have been developed. The computational difficulties have been avoided by using data augmentation. This idea is exploited to derive efficient Markov chain Monte Carlo methods. Although the method is proposed for ordered categories, it can also be implemented for unordered ones in a simple way. The model performance is shown through a simulation-based example and the analysis of the motivating study.  相似文献   

In this paper we present Bayesian analysis of finite mixtures of multivariate Poisson distributions with an unknown number of components. The multivariate Poisson distribution can be regarded as the discrete counterpart of the multivariate normal distribution, which is suitable for modelling multivariate count data. Mixtures of multivariate Poisson distributions allow for overdispersion and for negative correlations between variables. To perform Bayesian analysis of these models we adopt a reversible jump Markov chain Monte Carlo (MCMC) algorithm with birth and death moves for updating the number of components. We present results obtained from applying our modelling approach to simulated and real data. Furthermore, we apply our approach to a problem in multivariate disease mapping, namely joint modelling of diseases with correlated counts.  相似文献   

Summary.  We consider a finite mixture model with k components and a kernel distribution from a general one-parameter family. The problem of testing the hypothesis k =2 versus k 3 is studied. There has been no general statistical testing procedure for this problem. We propose a modified likelihood ratio statistic where under the null and the alternative hypotheses the estimates of the parameters are obtained from a modified likelihood function. It is shown that estimators of the support points are consistent. The asymptotic null distribution of the modified likelihood ratio test proposed is derived and found to be relatively simple and easily applied. Simulation studies for the asymptotic modified likelihood ratio test based on finite mixture models with normal, binomial and Poisson kernels suggest that the test proposed performs well. Simulation studies are also conducted for a bootstrap method with normal kernels. An example involving foetal movement data from a medical study illustrates the testing procedure.  相似文献   

The authors propose a procedure for determining the unknown number of components in mixtures by generalizing a Bayesian testing method proposed by Mengersen & Robert (1996). The testing criterion they propose involves a Kullback‐Leibler distance, which may be weighted or not. They give explicit formulas for the weighted distance for a number of mixture distributions and propose a stepwise testing procedure to select the minimum number of components adequate for the data. Their procedure, which is implemented using the BUGS software, exploits a fast collapsing approach which accelerates the search for the minimum number of components by avoiding full refitting at each step. The performance of their method is compared, using both distances, to the Bayes factor approach.  相似文献   

The analysis of traffic accident data is crucial to address numerous concerns, such as understanding contributing factors in an accident''s chain-of-events, identifying hotspots, and informing policy decisions about road safety management. The majority of statistical models employed for analyzing traffic accident data are logically count regression models (commonly Poisson regression) since a count – like the number of accidents – is used as the response. However, features of the observed data frequently do not make the Poisson distribution a tenable assumption. For example, observed data rarely demonstrate an equal mean and variance and often times possess excess zeros. Sometimes, data may have heterogeneous structure consisting of a mixture of populations, rather than a single population. In such data analyses, mixtures-of-Poisson-regression models can be used. In this study, the number of injuries resulting from casualties of traffic accidents registered by the General Directorate of Security (Turkey, 2005–2014) are modeled using a novel mixture distribution with two components: a Poisson and zero-truncated-Poisson distribution. Such a model differs from existing mixture models in literature where the components are either all Poisson distributions or all zero-truncated Poisson distributions. The proposed model is compared with the Poisson regression model via simulation and in the analysis of the traffic data.  相似文献   

