首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.  相似文献   

2.
Very often, the likelihoods for circular data sets are of quite complicated forms, and the functional forms of the normalising constants, which depend upon the unknown parameters, are unknown. This latter problem generally precludes rigorous, exact inference (both classical and Bayesian) for circular data.Noting the paucity of literature on Bayesian circular data analysis, and also because realistic data analysis is naturally permitted by the Bayesian paradigm, we address the above problem taking a Bayesian perspective. In particular, we propose a methodology that combines importance sampling and Markov chain Monte Carlo (MCMC) in a very effective manner to sample from the posterior distribution of the parameters, given the circular data. With simulation study and real data analysis, we demonstrate the considerable reliability and flexibility of our proposed methodology in analysing circular data.  相似文献   

3.
Hidden Markov models (HMMs) have been shown to be a flexible tool for modelling complex biological processes. However, choosing the number of hidden states remains an open question and the inclusion of random effects also deserves more research, as it is a recent addition to the fixed-effect HMM in many application fields. We present a Bayesian mixed HMM with an unknown number of hidden states and fixed covariates. The model is fitted using reversible-jump Markov chain Monte Carlo, avoiding the need to select the number of hidden states. We show through simulations that the estimations produced are more precise than those from a fixed-effect HMM and illustrate its practical application to the analysis of DNA copy number data, a field where HMMs are widely used.  相似文献   

4.
Multivariate mixtures of normals with unknown number of components   总被引:2,自引:0,他引:2  
We present full Bayesian analysis of finite mixtures of multivariate normals with unknown number of components. We adopt reversible jump Markov chain Monte Carlo and we construct, in a manner similar to that of Richardson and Green (1997), split and merge moves that produce good mixing of the Markov chains. The split moves are constructed on the space of eigenvectors and eigenvalues of the current covariance matrix so that the proposed covariance matrices are positive definite. Our proposed methodology has applications in classification and discrimination as well as heterogeneity modelling. We test our algorithm with real and simulated data.  相似文献   

5.
Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises because the posterior is invariant to permutations of the component parameters. Methods for dealing with label switching have been studied fairly extensively in the literature, with the most popular approaches being those based on loss functions. However, many of these algorithms turn out to be too slow in practice, and can be infeasible as the size and/or dimension of the data grow. We propose a new, computationally efficient algorithm based on a loss function interpretation, and show that it can scale up well in large data set scenarios. Then, we review earlier solutions which can scale up well for large data set, and compare their performances on simulated and real data sets. We conclude with some discussions and recommendations of all the methods studied.  相似文献   

6.
Studies of the behaviors of glaciers, ice sheets, and ice streams rely heavily on both observations and physical models. Data acquired via remote sensing provide critical information on geometry and movement of ice over large sections of Antarctica and Greenland. However, uncertainties are present in both the observations and the models. Hence, there is a need for combining these information sources in a fashion that incorporates uncertainty and quantifies its impact on conclusions. We present a hierarchical Bayesian approach to modeling ice-stream velocities incorporating physical models and observations regarding velocity, ice thickness, and surface elevation from the North East Ice Stream in Greenland. The Bayesian model leads to interesting issues in model assessment and computation.  相似文献   

7.
A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that it can be used, with minimal changes, for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Artificial and real data sets are used to illustrate the method and mixtures of univariate and of multivariate normals are explicitly considered. The problem of label switching, when parameter inference is of interest, is addressed in a post-processing stage.  相似文献   

8.
Malaria illness can be diagnosed by the presence of fever and parasitaemia. However, in highly endemic areas the diagnosis of clinical malaria can be difficult since children may tolerate parasites without fever and may have fever due to other causes. We propose a novel, simulation-based Bayesian approach for obtaining precise estimates of the probabilities of children with different levels of parasitaemia having fever due to malaria, by formulating the problem as a mixture of distributions. The methodology suggested is a general methodology for decomposing any two-component mixture distribution nonparametrically, when an independent training sample is available from one of the components. It is based on the assumption that one of the component distributions lies on the left of the other but there is some overlap between the distributions.  相似文献   

9.
This article considers misclassification of categorical covariates in the context of regression analysis; if unaccounted for, such errors usually result in mis-estimation of model parameters. With the presence of additional covariates, we exploit the fact that explicitly modelling non-differential misclassification with respect to the response leads to a mixture regression representation. Under the framework of mixture of experts, we enable the reclassification probabilities to vary with other covariates, a situation commonly caused by misclassification that is differential on certain covariates and/or by dependence between the misclassified and additional covariates. Using Bayesian inference, the mixture approach combines learning from data with external information on the magnitude of errors when it is available. In addition to proving the theoretical identifiability of the mixture of experts approach, we study the amount of efficiency loss resulting from covariate misclassification and the usefulness of external information in mitigating such loss. The method is applied to adjust for misclassification on self-reported cocaine use in the Longitudinal Studies of HIV-Associated Lung Infections and Complications.  相似文献   

10.
In [7], a Bayesian network for analysis of mixed traces of DNA was presented using gamma distributions for modelling peak sizes in the electropherogram. It was demonstrated that the analysis was sensitive to the choice of a variance factor and hence this should be adapted to any new trace analysed. In this paper, we discuss how the variance parameter can be estimated by maximum likelihood to achieve this. The unknown proportions of DNA from each contributor can similarly be estimated by maximum likelihood jointly with the variance parameter. Furthermore, we discuss how to incorporate prior knowledge about the parameters in a Bayesian analysis. The proposed estimation methods are illustrated through a few examples of applications for calculating evidential value in casework and for mixture deconvolution.  相似文献   

11.
12.
In this article, we develop statistical models for analysis of correlated mixed categorical (binary and ordinal) response data arising in medical and epidemi-ologic studies. There is evidence in the literature to suggest that models including correlation structure can lead to substantial improvement in precision of estimation or are more appropriate (accurate). We use a very rich class of scale mixture of multivariate normal (SMMVN) iink functions to accommodate heavy tailed distributions. In order to incorporate available historical information, we propose a unified prior elicitation scheme based on SMMVN-link models. Further, simulation-based techniques are developed to assess model adequacy. Finally, a real data example from prostate cancer studies is used to illustrate the proposed methodologies.  相似文献   

13.
In this article, we propose to evaluate and compare Markov chain Monte Carlo (MCMC) methods to estimate the parameters in a generalized extreme value model. We employed the Bayesian approach using traditional Metropolis-Hastings methods, Hamiltonian Monte Carlo (HMC), and Riemann manifold HMC (RMHMC) methods to obtain the approximations to the posterior marginal distributions of interest. Applications to real datasets and simulation studies provide evidence that the extra analytical work involved in Hamiltonian Monte Carlo algorithms is compensated by a more efficient exploration of the parameter space.  相似文献   

14.
We investigate the issue of bandwidth estimation in a functional nonparametric regression model with function-valued, continuous real-valued and discrete-valued regressors under the framework of unknown error density. Extending from the recent work of Shang (2013 Shang, H.L. (2013), ‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185198. doi: 10.1016/j.csda.2013.05.006[Crossref], [Web of Science ®] [Google Scholar]) [‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185–198], we approximate the unknown error density by a kernel density estimator of residuals, where the regression function is estimated by the functional Nadaraya–Watson estimator that admits mixed types of regressors. We derive a likelihood and posterior density for the bandwidth parameters under the kernel-form error density, and put forward a Bayesian bandwidth estimation approach that can simultaneously estimate the bandwidths. Simulation studies demonstrated the estimation accuracy of the regression function and error density for the proposed Bayesian approach. Illustrated by a spectroscopy data set in the food quality control, we applied the proposed Bayesian approach to select the optimal bandwidths in a functional nonparametric regression model with mixed types of regressors.  相似文献   

15.
Summary.  Advances in understanding the biological underpinnings of many cancers have led increasingly to the use of molecularly targeted anticancer therapies. Because the platelet-derived growth factor receptor (PDGFR) has been implicated in the progression of prostate cancer bone metastases, it is of great interest to examine possible relationships between PDGFR inhibition and therapeutic outcomes. We analyse the association between change in activated PDGFR (phosphorylated PDGFR) and progression-free survival time based on large within-patient samples of cell-specific phosphorylated PDGFR values taken before and after treatment from each of 88 prostate cancer patients. To utilize these paired samples as covariate data in a regression model for progression-free survival time, and be cause the phosphorylated PDGFR distributions are bimodal, we first employ a Bayesian hierarchical mixture model to obtain a deconvolution of the pretreatment and post-treatment within-patient phosphorylated PDGFR distributions. We evaluate fits of the mixture model and a non-mixture model that ignores the bimodality by using a supnorm metric to compare the empirical distribution of each phosphorylated PDGFR data set with the corresponding fitted distribution under each model. Our results show that first using the mixture model to account for the bimodality of the within-patient phosphorylated PDGFR distributions, and then using the posterior within-patient component mean changes in phosphorylated PDGFR so obtained as covariates in the regression model for progression-free survival time, provides an improved estimation.  相似文献   

16.
Solving label switching is crucial for interpreting the results of fitting Bayesian mixture models. The label switching originates from the invariance of posterior distribution to permutation of component labels. As a result, the component labels in Markov chain simulation may switch to another equivalent permutation, and the marginal posterior distribution associated with all labels may be similar and useless for inferring quantities relating to each individual component. In this article, we propose a new simple labelling method by minimizing the deviance of the class probabilities to a fixed reference labels. The reference labels can be chosen before running Markov chain Monte Carlo (MCMC) using optimization methods, such as expectation-maximization algorithms, and therefore the new labelling method can be implemented by an online algorithm, which can reduce the storage requirements and save much computation time. Using the Acid data set and Galaxy data set, we demonstrate the success of the proposed labelling method for removing the labelling switching in the raw MCMC samples.  相似文献   

17.
The fused lasso penalizes a loss function by the L1 norm for both the regression coefficients and their successive differences to encourage sparsity of both. In this paper, we propose a Bayesian generalized fused lasso modeling based on a normal-exponential-gamma (NEG) prior distribution. The NEG prior is assumed into the difference of successive regression coefficients. The proposed method enables us to construct a more versatile sparse model than the ordinary fused lasso using a flexible regularization term. Simulation studies and real data analyses show that the proposed method has superior performance to the ordinary fused lasso.  相似文献   

18.
We propose alternative approaches to analyze residuals in binary regression models based on random effect components. Our preferred model does not depend upon any tuning parameter, being completely automatic. Although the focus is mainly on accommodation of outliers, the proposed methodology is also able to detect them. Our approach consists of evaluating the posterior distribution of random effects included in the linear predictor. The evaluation of the posterior distributions of interest involves cumbersome integration, which is easily dealt with through stochastic simulation methods. We also discuss different specifications of prior distributions for the random effects. The potential of these strategies is compared in a real data set. The main finding is that the inclusion of extra variability accommodates the outliers, improving the adjustment of the model substantially, besides correctly indicating the possible outliers.  相似文献   

19.
In this paper we present Bayesian analysis of finite mixtures of multivariate Poisson distributions with an unknown number of components. The multivariate Poisson distribution can be regarded as the discrete counterpart of the multivariate normal distribution, which is suitable for modelling multivariate count data. Mixtures of multivariate Poisson distributions allow for overdispersion and for negative correlations between variables. To perform Bayesian analysis of these models we adopt a reversible jump Markov chain Monte Carlo (MCMC) algorithm with birth and death moves for updating the number of components. We present results obtained from applying our modelling approach to simulated and real data. Furthermore, we apply our approach to a problem in multivariate disease mapping, namely joint modelling of diseases with correlated counts.  相似文献   

20.
This paper considers a hierarchical Bayesian analysis of regression models using a class of Gaussian scale mixtures. This class provides a robust alternative to the common use of the Gaussian distribution as a prior distribution in particular for estimating the regression function subject to uncertainty about the constraint. For this purpose, we use a family of rectangular screened multivariate scale mixtures of Gaussian distribution as a prior for the regression function, which is flexible enough to reflect the degrees of uncertainty about the functional constraint. Specifically, we propose a hierarchical Bayesian regression model for the constrained regression function with uncertainty on the basis of three stages of a prior hierarchy with Gaussian scale mixtures, referred to as a hierarchical screened scale mixture of Gaussian regression models (HSMGRM). We describe distributional properties of HSMGRM and an efficient Markov chain Monte Carlo algorithm for posterior inference, and apply the proposed model to real applications with constrained regression models subject to uncertainty.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号