首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 105 毫秒
1.
The label-switching problem is one of the fundamental problems in Bayesian mixture analysis. Using all the Markov chain Monte Carlo samples as the initials for the expectation-maximization (EM) algorithm, we propose to label the samples based on the modes they converge to. Our method is based on the assumption that the samples converged to the same mode have the same labels. If a relative noninformative prior is used or the sample size is large, the posterior will be close to the likelihood and then the posterior modes can be located approximately by the EM algorithm for mixture likelihood, without assuming the availability of the closed form of the posterior. In order to speed up the computation of this labeling method, we also propose to first cluster the samples by K-means with a large number of clusters K. Then, by assuming that the samples within each cluster have the same labels, we only need to find one converged mode for each cluster. Using a Monte Carlo simulation study and a real dataset, we demonstrate the success of our new method in dealing with the label-switching problem.  相似文献   

2.
Two symmetrical fractional factorial designs are said to be combinatorially equivalent if one design can be obtained from another by reordering the runs, relabeling the factors and relabeling the levels of one or more factors. This article presents concepts of ordered distance frequency matrix, distance frequency vector, and reduced distance frequency vector for a design. Necessary conditions for two designs to be combinatorial equivalent based on these concepts are presented. A new algorithm based on the results obtained is proposed to check combinatorial non-equivalence of two factorial designs and several illustrating examples are provided.  相似文献   

3.
Effective component relabeling in Bayesian analyses of mixture models is critical to the routine use of mixtures in classification with analysis based on Markov chain Monte Carlo methods. The classification-based relabeling approach here is computationally attractive and statistically effective, and scales well with sample size and number of mixture components concordant with enabling routine analyses of increasingly large data sets. Building on the best of existing methods, practical relabeling aims to match data:component classification indicators in MCMC iterates with those of a defined reference mixture distribution. The method performs as well as or better than existing methods in small dimensional problems, while being practically superior in problems with larger data sets as the approach is scalable. We describe examples and computational benchmarks, and provide supporting code with efficient computational implementation of the algorithm that will be of use to others in practical applications of mixture models.  相似文献   

4.
Model based labeling for mixture models   总被引:1,自引:0,他引:1  
Label switching is one of the fundamental problems for Bayesian mixture model analysis. Due to the permutation invariance of the mixture posterior, we can consider that the posterior of a m-component mixture model is a mixture distribution with m! symmetric components and therefore the object of labeling is to recover one of the components. In order to do labeling, we propose to first fit a symmetric m!-component mixture model to the Markov chain Monte Carlo (MCMC) samples and then choose the label for each sample by maximizing the corresponding classification probabilities, which are the probabilities of all possible labels for each sample. Both parametric and semi-parametric ways are proposed to fit the symmetric mixture model for the posterior. Compared to the existing labeling methods, our proposed method aims to approximate the posterior directly and provides the labeling probabilities for all possible labels and thus has a model explanation and theoretical support. In addition, we introduce a situation in which the “ideally” labeled samples are available and thus can be used to compare different labeling methods. We demonstrate the success of our new method in dealing with the label switching problem using two examples.  相似文献   

5.
The mode of a distribution provides an important summary of data and is often estimated on the basis of some non‐parametric kernel density estimator. This article develops a new data analysis tool called modal linear regression in order to explore high‐dimensional data. Modal linear regression models the conditional mode of a response Y given a set of predictors x as a linear function of x . Modal linear regression differs from standard linear regression in that standard linear regression models the conditional mean (as opposed to mode) of Y as a linear function of x . We propose an expectation–maximization algorithm in order to estimate the regression coefficients of modal linear regression. We also provide asymptotic properties for the proposed estimator without the symmetric assumption of the error density. Our empirical studies with simulated data and real data demonstrate that the proposed modal regression gives shorter predictive intervals than mean linear regression, median linear regression and MM‐estimators.  相似文献   

6.
Solving label switching is crucial for interpreting the results of fitting Bayesian mixture models. The label switching originates from the invariance of posterior distribution to permutation of component labels. As a result, the component labels in Markov chain simulation may switch to another equivalent permutation, and the marginal posterior distribution associated with all labels may be similar and useless for inferring quantities relating to each individual component. In this article, we propose a new simple labelling method by minimizing the deviance of the class probabilities to a fixed reference labels. The reference labels can be chosen before running Markov chain Monte Carlo (MCMC) using optimization methods, such as expectation-maximization algorithms, and therefore the new labelling method can be implemented by an online algorithm, which can reduce the storage requirements and save much computation time. Using the Acid data set and Galaxy data set, we demonstrate the success of the proposed labelling method for removing the labelling switching in the raw MCMC samples.  相似文献   

7.
Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.  相似文献   

8.
This paper concerns maximum likelihood estimation for the semiparametric shared gamma frailty model; that is the Cox proportional hazards model with the hazard function multiplied by a gamma random variable with mean 1 and variance θ. A hybrid ML-EM algorithm is applied to 26 400 simulated samples of 400 to 8000 observations with Weibull hazards. The hybrid algorithm is much faster than the standard EM algorithm, faster than standard direct maximum likelihood (ML, Newton Raphson) for large samples, and gives almost identical results to the penalised likelihood method in S-PLUS 2000. When the true value θ0 of θ is zero, the estimates of θ are asymptotically distributed as a 50–50 mixture between a point mass at zero and a normal random variable on the positive axis. When θ0 > 0, the asymptotic distribution is normal. However, for small samples, simulations suggest that the estimates of θ are approximately distributed as an x ? (100 ? x)% mixture, 0 ≤ x ≤ 50, between a point mass at zero and a normal random variable on the positive axis even for θ0 > 0. In light of this, p-values and confidence intervals need to be adjusted accordingly. We indicate an approximate method for carrying out the adjustment.  相似文献   

9.
Label switching is one of the fundamental issues for Bayesian mixture modeling. It occurs due to the nonidentifiability of the components under symmetric priors. Without solving the label switching, the ergodic averages of component specific quantities will be identical and thus useless for inference relating to individual components, such as the posterior means, predictive component densities, and marginal classification probabilities. The author establishes the equivalence between the labeling and clustering and proposes two simple clustering criteria to solve the label switching. The first method can be considered as an extension of K-means clustering. The second method is to find the labels by minimizing the volume of labeled samples and this method is invariant to the scale transformation of the parameters. Using a simulation example and the application of two real data sets, the author demonstrates the success of these new methods in dealing with the label switching problem.  相似文献   

10.
We propose a novel method for tensorial‐independent component analysis. Our approach is based on TJADE and k‐JADE, two recently proposed generalizations of the classical JADE algorithm. Our novel method achieves the consistency and the limiting distribution of TJADE under mild assumptions and at the same time offers notable improvement in computational speed. Detailed mathematical proofs of the statistical properties of our method are given and, as a special case, a conjecture on the properties of k‐JADE is resolved. Simulations and timing comparisons demonstrate remarkable gain in speed. Moreover, the desired efficiency is obtained approximately for finite samples. The method is applied successfully to large‐scale video data, for which neither TJADE nor k‐JADE is feasible. Finally, an experimental procedure is proposed to select the values of a set of tuning parameters. Supplementary material including the R‐code for running the examples and the proofs of the theoretical results is available online.  相似文献   

11.
In this article, we investigate a new estimation approach for the partially linear single-index model based on modal regression method, where the non parametric function is estimated by penalized spline method. Moreover, we develop an expection maximum (EM)-type algorithm and establish the large sample properties of the proposed estimation method. A distinguishing characteristic of the newly proposed estimation is robust against outliers through introducing an additional tuning parameter which can be automatically selected using the observed data. Simulation studies and real data example are used to evaluate the finite-sample performance, and the results show that the newly proposed method works very well.  相似文献   

12.
This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away. Our approach builds on previous work by Holmes and Adams (J. R. Stat. Soc. Ser. B 64:295–306, 2002; Biometrika 90:99–112, 2003) and Cucala et al. (J. Am. Stat. Assoc. 104:263–273, 2009). Our work shares many of the advantages of these approaches in providing a probabilistic basis for the statistical inference. In comparison to previous work, we present a more efficient computational algorithm to overcome the intractability of the Markov random field model. The results of our algorithm are encouraging in comparison to the k-nearest neighbour algorithm.  相似文献   

13.
Two symmetric fractional factorial designs with qualitative and quantitative factors are equivalent if the design matrix of one can be obtained from the design matrix of the other by row and column permutations, relabeling of the levels of the qualitative factors and reversal of the levels of the quantitative factors. In this paper, necessary and sufficient methods of determining equivalence of any two symmetric designs with both types of factors are given. An algorithm used to check equivalence or non-equivalence is evaluated. If two designs are equivalent the algorithm gives a set of permutations which map one design to the other. Fast screening methods for non-equivalence are considered. Extensions of results to asymmetric fractional factorial designs with qualitative and quantitative factors are discussed.  相似文献   

14.
A computationally simple method for estimating finite-population quantiles in the presence of auxiliary information is proposed. An algorithm is also found for implementing related approaches for estimating quantiles, including that of Rao et al. (1990), obtained from inverting difference-type estimators of the distribution function. The proposed estimation procedure can be seen as a one-step iteration of the suggested algorithm and is asymptotically equivalent to the limiting estimator. In particular, the proposed method yields a simple and efficient way of approximating Rao et al.'s estimator. Simulation studies based on two real populations show that the approximation can be very satisfactory even for small to moderate samples.  相似文献   

15.
A fast new algorithm is proposed for numerical computation of (approximate) D-optimal designs. This cocktail algorithm extends the well-known vertex direction method (VDM; Fedorov in Theory of Optimal Experiments, 1972) and the multiplicative algorithm (Silvey et al. in Commun. Stat. Theory Methods 14:1379–1389, 1978), and shares their simplicity and monotonic convergence properties. Numerical examples show that the cocktail algorithm can lead to dramatically improved speed, sometimes by orders of magnitude, relative to either the multiplicative algorithm or the vertex exchange method (a variant of VDM). Key to the improved speed is a new nearest neighbor exchange strategy, which acts locally and complements the global effect of the multiplicative algorithm. Possible extensions to related problems such as nonparametric maximum likelihood estimation are mentioned.  相似文献   

16.
In this paper, we proposed a new family of distributions namely exponentiated exponential–geometric (E2G) distribution. The E2G distribution is a straightforwardly generalization of the exponential–geometric (EG) distribution proposed by Adamidis and Loukas [A lifetime distribution with decreasing failure rate, Statist. Probab. Lett. 39 (1998), pp. 35–42], which accommodates increasing, decreasing and unimodal hazard functions. It arises on a latent competing risk scenarios, where the lifetime associated with a particular risk is not observable but only the minimum lifetime value among all risks. The properties of the proposed distribution are discussed, including a formal proof of its probability density function and explicit algebraic formulas for its survival and hazard functions, moments, rth moment of the ith order statistic, mean residual lifetime and modal value. Maximum-likelihood inference is implemented straightforwardly. From a mis-specification simulation study performed in order to assess the extent of the mis-specification errors when testing the EG distribution against the E2G, and we observed that it is usually possible to discriminate between both distributions even for moderate samples with presence of censoring. The practical importance of the new distribution was demonstrated in three applications where we compare the E2G distribution with several lifetime distributions.  相似文献   

17.
We describe an algorithm to fit an SU -Curve of the Johnson system by moment matching. The algorithm follows from a new parametrization, and reduces the problem to a root finding procedure that can be implemented efficiently using a bisection or a Newton-Raphson method. This allows the four parameters of the Johnson curve to be determined to any desired degree of accuracy, and is fast enough to be implemented in a real-time setting. A practical application of the method lies in the fact that many firms use the Johnson system to manage financial risk  相似文献   

18.
We introduce a general Monte Carlo method based on Nested Sampling (NS), for sampling complex probability distributions and estimating the normalising constant. The method uses one or more particles, which explore a mixture of nested probability distributions, each successive distribution occupying ∼e −1 times the enclosed prior mass of the previous distribution. While NS technically requires independent generation of particles, Markov Chain Monte Carlo (MCMC) exploration fits naturally into this technique. We illustrate the new method on a test problem and find that it can achieve four times the accuracy of classic MCMC-based Nested Sampling, for the same computational effort; equivalent to a factor of 16 speedup. An additional benefit is that more samples and a more accurate evidence value can be obtained simply by continuing the run for longer, as in standard MCMC.  相似文献   

19.
Gibbs sampler as a computer-intensive algorithm is an important statistical tool both in application and in theoretical work. This algorithm, in many cases, is time-consuming; this paper extends the concept of using the steady-state ranked simulated sampling approach, utilized in Monte Carlo methods by Samawi [On the approximation of multiple integrals using steady state ranked simulated sampling, 2010, submitted for publication], to improve the well-known Gibbs sampling algorithm. It is demonstrated that this approach provides unbiased estimators, in the case of estimating the means and the distribution function, and substantially improves the performance of the Gibbs sampling algorithm and convergence, which results in a significant reduction in the costs and time required to attain a certain level of accuracy. Similar to Casella and George [Explaining the Gibbs sampler, Am. Statist. 46(3) (1992), pp. 167–174], we provide some analytical properties in simple cases and compare the performance of our method using the same illustrations.  相似文献   

20.
In this article, a new algorithm for rather expensive simulation problems is presented, which consists of two phases. In the first phase, as a model-based algorithm, the simulation output is used directly in the optimization stage. In the second phase, the simulation model is replaced by a valid metamodel. In addition, a new optimization algorithm is presented. To evaluate the performance of the proposed algorithm, it is applied to the (s,S) inventory problem as well as to five test functions. Numerical results show that the proposed algorithm leads to better solutions with less computational time than the corresponding metamodel-based algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号