共查询到20条相似文献,搜索用时 9 毫秒
1.
《Journal of Statistical Computation and Simulation》2012,82(2):394-413
Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises because the posterior is invariant to permutations of the component parameters. Methods for dealing with label switching have been studied fairly extensively in the literature, with the most popular approaches being those based on loss functions. However, many of these algorithms turn out to be too slow in practice, and can be infeasible as the size and/or dimension of the data grow. We propose a new, computationally efficient algorithm based on a loss function interpretation, and show that it can scale up well in large data set scenarios. Then, we review earlier solutions which can scale up well for large data set, and compare their performances on simulated and real data sets. We conclude with some discussions and recommendations of all the methods studied. 相似文献
2.
《Journal of Statistical Computation and Simulation》2012,82(3):209-231
In this paper, a Bayesian two-stage D–D optimal design for mixture experimental models under model uncertainty is developed. A Bayesian D-optimality criterion is used in the first stage to minimize the determinant of the posterior variances of the parameters. The second stage design is then generated according to an optimalityprocedure that collaborates with the improved model from the first stage data. The results show that a Bayesian two-stage D–D-optimal design for mixture experiments under model uncertainty is more efficient than both the Bayesian one-stage D-optimal design and the non-Bayesian one-stage D-optimal design in most situations. Furthermore, simulations are used to obtain a reasonable ratio of the sample sizes between the two stages. 相似文献
3.
Mixture of linear mixed-effects models has received considerable attention in longitudinal studies, including medical research, social science and economics. The inferential question of interest is often the identification of critical factors that affect the responses. We consider a Bayesian approach to select the important fixed and random effects in the finite mixture of linear mixed-effects models. To accomplish our goal, latent variables are introduced to facilitate the identification of influential fixed and random components and to classify the membership of observations in the longitudinal data. A spike-and-slab prior for the regression coefficients is adopted to sidestep the potential complications of highly collinear covariates and to handle large p and small n issues in the variable selection problems. Here we employ Markov chain Monte Carlo (MCMC) sampling techniques for posterior inferences and explore the performance of the proposed method in simulation studies, followed by an actual psychiatric data analysis concerning depressive disorder. 相似文献
4.
The label switching problem is caused by the likelihood of a Bayesian mixture model being invariant to permutations of the labels. The permutation can change multiple times between Markov Chain Monte Carlo (MCMC) iterations making it difficult to infer component-specific parameters of the model. Various so-called ‘relabelling’ strategies exist with the goal to ‘undo’ the label switches that have occurred to enable estimation of functions that depend on component-specific parameters. Existing deterministic relabelling algorithms rely upon specifying a loss function, and relabelling by minimising its posterior expected loss. In this paper we develop probabilistic approaches to relabelling that allow for estimation and incorporation of the uncertainty in the relabelling process. Variants of the probabilistic relabelling algorithm are introduced and compared to existing deterministic relabelling algorithms. We demonstrate that the idea of probabilistic relabelling can be expressed in a rigorous framework based on the EM algorithm. 相似文献
5.
《Journal of statistical planning and inference》2003,113(1):15-24
Bayesian predictive density functions, which are necessary to obtain bounds for predictive intervals of future order statistics, are obtained when the population density is a finite mixture of general components. Such components include, among others, the Weibull (exponential and Rayleigh as special cases), compound Weibull (three-parameter Burr type XII), Pareto, beta, Gompertz and compound Gompertz distributions. The prior belief of the experimenter is measured by a general distribution that was suggested by AL-Hussaini (J. Statist. Plann. Inf. 79 (1999b) 79). Applications to finite mixtures of Weibull and Burr type XII components are illustrated and comparison is made, in the special cases of the exponential and Pareto type II components, with previous results. 相似文献
6.
Yuan Ji Guosheng Yin Kam-Wah Tsui Mikhail G. Kolonin Jessica Sun Wadih Arap Renata Pasqualini Kim-Anh Do 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(2):139-152
Summary. Phage display is a biological process that is used to screen random peptide libraries for ligands that bind to a target of interest with high affinity. On the basis of a count data set from an innovative multistage phage display experiment, we propose a class of Bayesian mixture models to cluster peptide counts into three groups that exhibit different display patterns across stages. Among the three groups, the investigators are particularly interested in that with an ascending display pattern in the counts, which implies that the peptides are likely to bind to the target with strong affinity. We apply a Bayesian false discovery rate approach to identify the peptides with the strongest affinity within the group. A list of peptides is obtained, among which important ones with meaningful functions are further validated by biologists. To examine the performance of the Bayesian model, we conduct a simulation study and obtain desirable results. 相似文献
7.
Normality and independence of error terms are typical assumptions for partial linear models. However, these assumptions may be unrealistic in many fields, such as economics, finance and biostatistics. In this paper, a Bayesian analysis for partial linear model with first-order autoregressive errors belonging to the class of the scale mixtures of normal distributions is studied in detail. The proposed model provides a useful generalization of the symmetrical linear regression model with independent errors, since the distribution of the error term covers both correlated and thick-tailed distributions, and has a convenient hierarchical representation allowing easy implementation of a Markov chain Monte Carlo scheme. In order to examine the robustness of the model against outlying and influential observations, a Bayesian case deletion influence diagnostics based on the Kullback–Leibler (K–L) divergence is presented. The proposed method is applied to monthly and daily returns of two Chilean companies. 相似文献
8.
In the field of molecular biology, it is often of interest to analyze microarray data for clustering genes based on similar profiles of gene expression to identify genes that are differentially expressed under multiple biological conditions. One of the notable characteristics of a gene expression profile is that it shows a cyclic curve over a course of time. To group sequences of similar molecular functions, we propose a Bayesian Dirichlet process mixture of linear regression models with a Fourier series for the regression coefficients, for each of which a spike and slab prior is assumed. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo (MCMC) posterior computation. Due to the so-called “label-switching” problem and different numbers of clusters during the MCMC computation, a post-process approach of Fritsch and Ickstadt (2009) is additionally applied to MCMC samples for an optimal single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together. The proposed method is illustrated with two simulated data and one real data of the physiological response of fibroblasts to serum of Iyer et al. (1999). 相似文献
9.
《Journal of Statistical Computation and Simulation》2012,82(10):1926-1944
In this paper, we consider a special finite mixture model named Combination of Uniform and shifted Binomial (CUB), recently introduced in the statistical literature to analyse ordinal data expressing the preferences of raters with regards to items or services. Our aim is to develop a variable selection procedure for this model using a Bayesian approach. Bayesian methods for variable selection and model choice have become increasingly popular in recent years, due to advances in Markov chain Monte Carlo computational algorithms. Several methods have been proposed in the case of linear and generalized linear models (GLM). In this paper, we adapt to the CUB model some of these algorithms: the Kuo–Mallick method together with its ‘metropolized’ version and the Stochastic Search Variable Selection method. Several simulated examples are used to illustrate the algorithms and to compare their performance. Finally, an application to real data is introduced. 相似文献
10.
In this article, we apply the Bayesian approach to the linear mixed effect models with autoregressive(p) random errors under mixture priors obtained with the Markov chain Monte Carlo (MCMC) method. The mixture structure of a point mass and continuous distribution can help to select the variables in fixed and random effects models from the posterior sample generated using the MCMC method. Bayesian prediction of future observations is also one of the major concerns. To get the best model, we consider the commonly used highest posterior probability model and the median posterior probability model. As a result, both criteria tend to be needed to choose the best model from the entire simulation study. In terms of predictive accuracy, a real example confirms that the proposed method provides accurate results. 相似文献
11.
In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Inheriting their asymptotic properties, our information criteria are consistent in variable selection in both the large-sample and the high-dimensional asymptotic frameworks. In numerical simulations, variable selection methods based on our information criteria choose the true set of variables with high probability in most cases. 相似文献
12.
Parsimonious Gaussian mixture models 总被引:3,自引:0,他引:3
Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed. These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance. 相似文献
13.
We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun.
Stat., Simul. Comput. 36:45–54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate
this flexibility we consider priors defined through infinite sequences of independent positive random variables. Two applications
are considered: density estimation using mixture models and hazard function estimation. In each case we show how the slice
efficient sampler can be applied to make inference in the models. In the mixture case, two submodels are studied in detail.
The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverse-Gaussian
distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of
occupied clusters in a sample. Extensive computational comparisons with alternative “conditional” simulation techniques for
mixture models using the standard Dirichlet process prior and our new priors are made. The properties of the new priors are
illustrated on a density estimation problem. 相似文献
14.
A central issue in principal component analysis (PCA) is that of choosing the appropriate number of principal components to be retained. Bishop (1999a) suggested a Bayesian approach for PCA for determining the effective dimensionality automatically on the basis of the probabilistic latent variable model. This paper extends this approach by using mixture priors, in that the choice dimensionality and estimation of principal components are done simultaneously via MCMC algorithm. Also, the proposed method provides a probabilistic measure of uncertainty on PCA, yielding posterior probabilities of all possible cases of principal components. 相似文献
15.
Hong-Tu Zhu Heping Zhang 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2004,66(1):3-16
Summary. We establish asymptotic theory for both the maximum likelihood and the maximum modified likelihood estimators in mixture regression models. Moreover, under specific and reasonable conditions, we show that the optimal convergence rate of n −1/4 for estimating the mixing distribution is achievable for both the maximum likelihood and the maximum modified likelihood estimators. We also derive the asymptotic distributions of two log-likelihood ratio test statistics for testing homogeneity and we propose a resampling procedure for approximating the p -value. Simulation studies are conducted to investigate the empirical performance of the two test statistics. Finally, two real data sets are analysed to illustrate the application of our theoretical results. 相似文献
16.
Latent growth curve models as structural equation models are extensively discussed in various research fields (Curran and Muthén in Am. J. Community Psychol. 27:567–595, 1999; Duncan et al. in An introduction to latent variable growth curve modeling. Concepts, issues and applications, 2nd edn., Lawrence Earlbaum, Mahwah, 2006; Muthén and Muthén in Alcohol. Clin. Exp. Res. 24(6):882–891, 2000a; in J. Stud. Alcohol. 61:290–300, 2000b). Recent methodological and statistical extension are focused on the consideration of unobserved heterogeneity in empirical data. Muthén extended the classic structural equation approach by mixture components, i.e. categorical latent classes (Muthén in Marcouldies, G.A., Sckumacker, R.E. (eds.), New developments and techniques in structural equation modeling, pp. 1–33, Lawrance Erlbaum, Mahwah, 2001a; in Behaviometrika 29(1):81–117, 2002; in Kaplan, D. (ed.), The SAGE handbook of quantitative methodology for the social sciences, pp. 345–368, Sage, Thousand Oaks, 2004). The paper discusses applications of growth mixture models with data on delinquent behavior of adolescents from the German panel study Crime in the modern City (CrimoC) (Boers et al. in Eur. J. Criminol. 7:499–520, 2010; Reinecke in Delinquenzverläufe im Jugendalter: Empirische Überprüfung von Wachstums- und Mischverteilungsmodellen, Institut für sozialwissenschaftliche Forschung e.V., Münster, 2006a; in Methodology 2:100–112, 2006b; in van Montfort, K., Oud, J., Satorra, A. (eds.), Longitudinal models in the behavioral and related sciences, pp. 239–266, Lawrence Erlbaum, Mahwah, 2007). Observed as well as unobserved heterogeneity will be considered with growth mixture models. Special attention is given to the distribution of the outcome variables as counts. Poisson and negative binomial distributions with zero inflation are considered in the proposed growth mixture models variables. Different model specifications will be emphasized with respect to their particular parameterizations. 相似文献
17.
We propose frailty regression models in mixture distributions and assume the distribution of frailty as gamma or positive stable or power variance function distribution. We consider Weibull mixture as an example. There are some interesting situations like survival times in genetic epidemiology, dental implants of patients and twin births (both monozygotic and dizygotic) where genetic behavior (which is unknown and random) of patients follows a known frailty distribution. These are the situations which motivate to study this particular model. 相似文献
18.
《Journal of Statistical Computation and Simulation》2012,82(1-3):45-53
Generating samples from a two-stage distribution is an important part of the study of mixture models. These samples are used to examine estimation procedures, and other properties of the mixture model. In this paper we present an exemplary sampling method for generating data from the mixed distribution. This method uses the order statistic spacings of the mixing distribution and random sampling from the distribution conditional on the mixing variable to produce samples from the mixed distribution. We show that this exemplary procedure often produces data with an empirical distribution function closer to the mixed distribution than the Method of Composition. We illustrate the method with an example. 相似文献
19.
This paper deals with the problem of robustness of Bayesian regression with respect to the data. We first give a formal definition of Bayesian robustness to data contamination, prove that robustness according to the definition cannot be obtained by using heavy-tailed error distributions in linear regression models and propose a heteroscedastic approach to achieve the desired Bayesian robustness. 相似文献
20.
Inference of interaction networks represented by systems of differential equations is a challenging problem in many scientific disciplines. In the present article, we follow a semi-mechanistic modelling approach based on gradient matching. We investigate the extent to which key factors, including the kinetic model, statistical formulation and numerical methods, impact upon performance at network reconstruction. We emphasize general lessons for computational statisticians when faced with the challenge of model selection, and we assess the accuracy of various alternative paradigms, including recent widely applicable information criteria and different numerical procedures for approximating Bayes factors. We conduct the comparative evaluation with a novel inferential pipeline that systematically disambiguates confounding factors via an ANOVA scheme. 相似文献