期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An online Bayesian mixture labelling method by minimizing deviance of classification probabilities to reference labels

《Journal of Statistical Computation and Simulation》2012,82(2):310-323

Solving label switching is crucial for interpreting the results of fitting Bayesian mixture models. The label switching originates from the invariance of posterior distribution to permutation of component labels. As a result, the component labels in Markov chain simulation may switch to another equivalent permutation, and the marginal posterior distribution associated with all labels may be similar and useless for inferring quantities relating to each individual component. In this article, we propose a new simple labelling method by minimizing the deviance of the class probabilities to a fixed reference labels. The reference labels can be chosen before running Markov chain Monte Carlo (MCMC) using optimization methods, such as expectation-maximization algorithms, and therefore the new labelling method can be implemented by an online algorithm, which can reduce the storage requirements and save much computation time. Using the Acid data set and Galaxy data set, we demonstrate the success of the proposed labelling method for removing the labelling switching in the raw MCMC samples. 相似文献

2.

Model based labeling for mixture models 总被引：1，自引：0，他引：1

Weixin Yao 《Statistics and Computing》2012,22(2):337-347

Label switching is one of the fundamental problems for Bayesian mixture model analysis. Due to the permutation invariance of the mixture posterior, we can consider that the posterior of a m-component mixture model is a mixture distribution with m! symmetric components and therefore the object of labeling is to recover one of the components. In order to do labeling, we propose to first fit a symmetric m!-component mixture model to the Markov chain Monte Carlo (MCMC) samples and then choose the label for each sample by maximizing the corresponding classification probabilities, which are the probabilities of all possible labels for each sample. Both parametric and semi-parametric ways are proposed to fit the symmetric mixture model for the posterior. Compared to the existing labeling methods, our proposed method aims to approximate the posterior directly and provides the labeling probabilities for all possible labels and thus has a model explanation and theoretical support. In addition, we introduce a situation in which the “ideally” labeled samples are available and thus can be used to compare different labeling methods. We demonstrate the success of our new method in dealing with the label switching problem using two examples. 相似文献

3.

Relabel mixture models via modal clustering

Qiang Wu Weixin Yao 《统计学通讯:模拟与计算》2017,46(5):3406-3418

Effectively solving the label switching problem is critical for both Bayesian and Frequentist mixture model analyses. In this article, a new relabeling method is proposed by extending a recently developed modal clustering algorithm. First, the posterior distribution is estimated by a kernel density from permuted MCMC or bootstrap samples of parameters. Second, a modal EM algorithm is used to find the m! symmetric modes of the KDE. Finally, samples that ascend to the same mode are assigned the same label. Simulations and real data applications demonstrate that the new method provides more accurate estimates than many existing relabeling methods. 相似文献

4.

Reversible jump and the label switching problem in hidden Markov models

Luigi Spezia 《Journal of statistical planning and inference》2009

Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms can be efficiently applied in Bayesian inference for hidden Markov models (HMMs), when the number of latent regimes is unknown. As for finite mixture models, when priors are invariant to the relabelling of the regimes, HMMs are unidentifiable in data fitting, because multiple ways to label the regimes can alternate during the MCMC iterations; this is the so-called label switching problem. HMMs with an unknown number of regimes are considered here and the goal of this paper is the comparison, both applied and theoretical, of five methods used for tackling label switching within a RJMCMC algorithm; they are: post-processing, partial reordering, permutation sampling, sampling from a Markov prior and rejection sampling. The five strategies we compare have been proposed mostly in the literature of finite mixture models and only two of them, i.e. rejection sampling and partial reordering, have been presented in RJMCMC algorithms for HMMs. We consider RJMCMC algorithms in which the parameters are updated by Gibbs sampling and the dimension of the model changes in split-and-merge and birth-and-death moves. Finally, an example illustrates and compares the five different methodologies. 相似文献

5.

Asymptotic properties of likelihood ratio test statistics in affected‐sib‐pair analysis

Zeny Z. Feng Jiahua Chen Mary E. Thompson 《Revue canadienne de statistique》2007,35(3):351-364

In an affected‐sib‐pair genetic linkage analysis, identical by descent data for affected sib pairs are routinely collected at a large number of markers along chromosomes. Under very general genetic assumptions, the IBD distribution at each marker satisfies the possible triangle constraint. Statistical analysis of IBD data should thus utilize this information to improve efficiency. At the same time, this constraint renders the usual regularity conditions for likelihood‐based statistical methods unsatisfied. In this paper, the authors study the asymptotic properties of the likelihood ratio test (LRT) under the possible triangle constraint. They derive the limiting distribution of the LRT statistic based on data from a single locus. They investigate the precision of the asymptotic distribution and the power of the test by simulation. They also study the test based on the supremum of the LRT statistics over the markers distributed throughout a chromosome. Instead of deriving a limiting distribution for this test, they use a mixture of chi‐squared distributions to approximate its true distribution. Their simulation results show that this approach has desirable simplicity and satisfactory precision. 相似文献

6.

Handling the Label Switching Problem in Latent Class Models Via the ECR Algorithm

Panagiotis Papastamoulis 《统计学通讯:模拟与计算》2013,42(4):913-927

Latent class models (LCMs) are specific cases of mixture models. Under a Bayesian setup, the symmetric posterior distribution of these models leads Markov chain Monte Carlo (MCMC) methods to suffer from the so-called label switching problem. In this article, we treat the corresponding MCMC outputs using a recent approach, namely, the Equivalence Classes Representative (ECR) algorithm and conclude that it can effectively solve the label switching problem by considering several examples of LCMs, such as mixtures of regressions, hidden Markov models, and Markov random fields. Moreover, the superiority of this method over other approaches becomes apparent. 相似文献

7.

Computation of an efficient and robust estimator in a semiparametric mixture model

Jingjing Wu Weixin Yao 《Journal of Statistical Computation and Simulation》2017,87(11):2128-2137

In this article, we propose an efficient and robust estimation for the semiparametric mixture model that is a mixture of unknown location-shifted symmetric distributions. Our estimation is derived by minimizing the profile Hellinger distance (MPHD) between the model and a nonparametric density estimate. We propose a simple and efficient algorithm to find the proposed MPHD estimation. Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed procedure and to compare it with other existing methods. Based on our empirical studies, the newly proposed procedure works very competitively compared to the existing methods for normal component cases and much better for non-normal component cases. More importantly, the proposed procedure is robust when the data are contaminated with outlying observations. A real data application is also provided to illustrate the proposed estimation procedure. 相似文献

8.

A comparison of the parameter estimation methods for bimodal mixture Weibull distribution with complete data

Aydin Karakoca Ulku Erisoglu Murat Erisoglu 《Journal of applied statistics》2015,42(7):1472-1489

Bimodal mixture Weibull distribution being a special case of mixture Weibull distribution has been used recently as a suitable model for heterogeneous data sets in many practical applications. The bimodal mixture Weibull term represents a mixture of two Weibull distributions. Although many estimation methods have been proposed for the bimodal mixture Weibull distribution, there is not a comprehensive comparison. This paper presents a detailed comparison of five kinds of numerical methods, such as maximum likelihood estimation, least-squares method, method of moments, method of logarithmic moments and percentile method (PM) in terms of several criteria by simulation study. Also parameter estimation methods are applied to real data. 相似文献

9.

Smooth Semi‐nonparametric Analysis for Mixture Cure Models and Its Application to Breast Cancer

Haifen Li Jiajia Zhang Yincai Tang 《Australian & New Zealand Journal of Statistics》2014,56(3):217-235

Mixture cure models are widely used when a proportion of patients are cured. The proportional hazards mixture cure model and the accelerated failure time mixture cure model are the most popular models in practice. Usually the expectation–maximisation (EM) algorithm is applied to both models for parameter estimation. Bootstrap methods are used for variance estimation. In this paper we propose a smooth semi‐nonparametric (SNP) approach in which maximum likelihood is applied directly to mixture cure models for parameter estimation. The variance can be estimated by the inverse of the second derivative of the SNP likelihood. A comprehensive simulation study indicates good performance of the proposed method. We investigate stage effects in breast cancer by applying the proposed method to breast cancer data from the South Carolina Cancer Registry. 相似文献

10.

A profile likelihood method for normal mixture with unequal variance

Weixin Yao 《Journal of statistical planning and inference》2010

It is well known that the normal mixture with unequal variance has unbounded likelihood and thus the corresponding global maximum likelihood estimator (MLE) is undefined. One of the commonly used solutions is to put a constraint on the parameter space so that the likelihood is bounded and then one can run the EM algorithm on this constrained parameter space to find the constrained global MLE. However, choosing the constraint parameter is a difficult issue and in many cases different choices may give different constrained global MLE. In this article, we propose a profile log likelihood method and a graphical way to find the maximum interior mode. Based on our proposed method, we can also see how the constraint parameter, used in the constrained EM algorithm, affects the constrained global MLE. Using two simulation examples and a real data application, we demonstrate the success of our new method in solving the unboundness of the mixture likelihood and locating the maximum interior mode. 相似文献

11.

Marginal Regression Model with Time-Varying Coefficients for Panel Data

Liuquan Sun Shaojun Guo Min Chen 《统计学通讯:理论与方法》2013,42(8):1241-1261

In this article, we formulate a class of semiparametric marginal means models with a mixture of time-varying and time-independent parameters for analyzing panel data. For inference about the regression parameters, an estimation procedure is developed and asymptotic properties of the proposed estimators are established. In addition, some tests are presented for investigating whether or not covariate effects vary with time. The finite-sample behavior of the proposed methods is examined in simulation studies, and the data from an AIDS clinical trial study are used to illustrate the methodology. 相似文献

12.

Bayesian semiparametric modeling for stochastic precedence,with applications in epidemiology and survival analysis

Athanasios Kottas 《Lifetime data analysis》2011,17(1):135-155

We propose a prior probability model for two distributions that are ordered according to a stochastic precedence constraint, a weaker restriction than the more commonly utilized stochastic order constraint. The modeling approach is based on structured Dirichlet process mixtures of normal distributions. Full inference for functionals of the stochastic precedence constrained mixture distributions is obtained through a Markov chain Monte Carlo posterior simulation method. A motivating application involves study of the discriminatory ability of continuous diagnostic tests in epidemiologic research. Here, stochastic precedence provides a natural restriction for the distributions of test scores corresponding to the non-infected and infected groups. Inference under the model is illustrated with data from a diagnostic test for Johne’s disease in dairy cattle. We also apply the methodology to the comparison of survival distributions associated with two distinct conditions, and illustrate with analysis of data on survival time after bone marrow transplantation for treatment of leukemia. 相似文献

13.

Bayesian variable selection with strong heredity constraints

Joungyoun Kim Johan Lim Yongdai Kim Woncheol Jang 《Journal of the Korean Statistical Society》2018,47(3):314-329

In this paper, we propose a Bayesian variable selection method for linear regression models with high-order interactions. Our method automatically enforces the heredity constraint, that is, a higher order interaction term can exist in the model only if both of its parent terms are in the model. Based on the stochastic search variable selection George and McCulloch (1993), we propose a novel hierarchical prior that fully considers the heredity constraint and controls the degree of sparsity simultaneously. We develop a Markov chain Monte Carlo (MCMC) algorithm to explore the model space efficiently while accounting for the heredity constraint by modifying the shotgun stochastic search algorithm Hans et al. (2007). The performance of the new model is demonstrated through comparisons with other methods. Numerical studies on both real data analysis and simulations show that our new method tends to find relevant variable more effectively when higher order interaction terms are considered. 相似文献

14.

Wavelet‐based estimators for mixture regression

Michel H. Montoril Aluísio Pinheiro Brani Vidakovic 《Scandinavian Journal of Statistics》2019,46(1):215-234

We consider a process that is observed as a mixture of two random distributions, where the mixing probability is an unknown function of time. The setup is built upon a wavelet‐based mixture regression. Two linear wavelet estimators are proposed. Furthermore, we consider three regularizing procedures for each of the two wavelet methods. We also discuss regularity conditions under which the consistency of the wavelet methods is attained and derive rates of convergence for the proposed estimators. A Monte Carlo simulation study is conducted to illustrate the performance of the estimators. Various scenarios for the mixing probability function are used in the simulations, in addition to a range of sample sizes and resolution levels. We apply the proposed methods to a data set consisting of array Comparative Genomic Hybridization from glioblastoma cancer studies. 相似文献

15.

An EM-type algorithm for multivariate mixture models

G. R. Oskrochi R. B. Davies 《Statistics and Computing》1997,7(2):145-151

This paper introduces a new approach, based on dependent univariate GLMs, for fitting multivariate mixture models. This approach is a multivariate generalization of the method for univariate mixtures presented by Hinde (1982). Its accuracy and efficiency are compared with direct maximization of the log-likelihood. Using a simulation study, we also compare the efficiency of Monte Carlo and Gaussian quadrature methods for approximating the mixture distribution. The new approach with Gaussian quadrature outperforms the alternative methods considered. The work is motivated by the multivariate mixture models which have been proposed for modelling changes of employment states at an individual level. Similar formulations are of interest for modelling movement between other social and economic states and multivariate mixture models also occur in biostatistics and epidemiology. 相似文献

16.

Markov chain Monte Carlo estimation of a mixture item response theory model

Sun-Joo Cho Allan S. Cohen Seock-Ho Kim 《Journal of Statistical Computation and Simulation》2013,83(2):278-306

Markov chain Monte Carlo (MCMC) algorithms have been shown to be useful for estimation of complex item response theory (IRT) models. Although an MCMC algorithm can be very useful, it also requires care in use and interpretation of results. In particular, MCMC algorithms generally make extensive use of priors on model parameters. In this paper, MCMC estimation is illustrated using a simple mixture IRT model, a mixture Rasch model (MRM), to demonstrate how the algorithm operates and how results may be affected by some commonly used priors. Priors on the probabilities of mixtures, label switching, model selection, metric anchoring, and implementation of the MCMC algorithm using WinBUGS are described, and their effects illustrated on parameter recovery in practical testing situations. In addition, an example is presented in which an MRM is fitted to a set of educational test data using the MCMC algorithm and a comparison is illustrated with results from three existing maximum likelihood estimation methods. 相似文献

17.

Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches

Timothy E. Hanson Athanasios Kottas Adam J. Branscum 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(2):207-225

Summary. The evaluation of the performance of a continuous diagnostic measure is a commonly encountered task in medical research. We develop Bayesian non-parametric models that use Dirichlet process mixtures and mixtures of Polya trees for the analysis of continuous serologic data. The modelling approach differs from traditional approaches to the analysis of receiver operating characteristic curve data in that it incorporates a stochastic ordering constraint for the distributions of serologic values for the infected and non-infected populations. Biologically such a constraint is virtually always feasible because serologic values from infected individuals tend to be higher than those for non-infected individuals. The models proposed provide data-driven inferences for the infected and non-infected population distributions, and for the receiver operating characteristic curve and corresponding area under the curve. We illustrate and compare the predictive performance of the Dirichlet process mixture and mixture of Polya trees approaches by using serologic data for Johne's disease in dairy cattle. 相似文献

18.

A new model selection procedure for finite mixture regression models

Conglian Yu 《统计学通讯:理论与方法》2020,49(18):4347-4366

Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis. 相似文献

19.

Estimation of gaussian mixtures with rotationally invariant covariance matrices

R.L. Streit Luginbuhl T. E 《统计学通讯:理论与方法》2013,42(12):2927-2944

Homoscedastic and heteroscedastic Gaussian mixtures differ in the constraints placed on the covariance matrices of the mixture components. A new mixture, called herein a strophoscedastic mixture, is defined by a new constraint, This constraint requires the matrices to be identical under orthogonal trans¬formations, where different transformations are allowed for different matrices. It is shown that the M-step of the EM method for estimating the parameters of strophoscedastic mixtures from sample data is explicitly solvable using singular value decompositions. Consequently, the EM-based maximum likelihood estimation algorithm is as easily implemented for strophoscedastic mixtures as it is for homoscedastic and heteroscedastic mixtures. An example of a “noisy” Archimedian spiral is presented. 相似文献

20.

A dependent Dirichlet process model for survival data with competing risks

Shi Yushu Laud Purushottam Neuner Joan 《Lifetime data analysis》2021,27(1):156-176

In this paper, we first propose a dependent Dirichlet process (DDP) model using a mixture of Weibull models with each mixture component resembling a Cox model for survival data. We then build a Dirichlet process mixture model for competing risks data without regression covariates. Next we extend this model to a DDP model for competing risks regression data by using a multiplicative covariate effect on subdistribution hazards in the mixture components. Though built on proportional hazards (or subdistribution hazards) models, the proposed nonparametric Bayesian regression models do not require the assumption of constant hazard (or subdistribution hazard) ratio. An external time-dependent covariate is also considered in the survival model. After describing the model, we discuss how both cause-specific and subdistribution hazard ratios can be estimated from the same nonparametric Bayesian model for competing risks regression. For use with the regression models proposed, we introduce an omnibus prior that is suitable when little external information is available about covariate effects. Finally we compare the models’ performance with existing methods through simulations. We also illustrate the proposed competing risks regression model with data from a breast cancer study. An R package “DPWeibull” implementing all of the proposed methods is available at CRAN.

相似文献