首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 315 毫秒
1.
Summary.  The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem. A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters. We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer. We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution. We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes. This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets. We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.  相似文献   

2.
Abstract. DNA array technology is an important tool for genomic research due to its capa‐city of measuring simultaneously the expression levels of a great number of genes or fragments of genes in different experimental conditions. An important point in gene expression data analysis is to identify clusters of genes which present similar expression levels. We propose a new procedure for estimating the mixture model for clustering of gene expression data. The proposed method is a posterior split‐merge‐birth MCMC procedure which does not require the specification of the number of components, since it is estimated jointly with component parameters. The strategy for splitting is based on data and on posterior distribution from the previously allocated observations. This procedure defines a quick split proposal in contrary to other split procedures, which require substantial computational effort. The performance of the method is verified using real and simulated datasets.  相似文献   

3.
Clustering algorithms are used in the analysis of gene expression data to identify groups of genes with similar expression patterns. These algorithms group genes with respect to a predefined dissimilarity measure without using any prior classification of the data. Most of the clustering algorithms require the number of clusters as input, and all the objects in the dataset are usually assigned to one of the clusters. We propose a clustering algorithm that finds clusters sequentially, and allows for sporadic objects, so there are objects that are not assigned to any cluster. The proposed sequential clustering algorithm has two steps. First it finds candidates for centers of clusters. Multiple candidates are used to make the search for clusters more efficient. Secondly, it conducts a local search around the candidate centers to find the set of objects that defines a cluster. The candidate clusters are compared using a predefined score, the best cluster is removed from data, and the procedure is repeated. We investigate the performance of this algorithm using simulated data and we apply this method to analyze gene expression profiles in a study on the plasticity of the dendritic cells.  相似文献   

4.
In this article, utilizing a scale mixture of skew-normal distribution in which mixing random variable is assumed to follow a mixture model with varying weights for each observation, we introduce a generalization of skew-normal linear regression model with the aim to provide resistant results. This model, which also includes the skew-slash distribution in a particular case, allows us to accommodate and detect outlying observations under the skew-normal linear regression model. Inferences about the model are carried out through the empirical Bayes approach. The conditions for propriety of the posterior and for existence of posterior moments are given under the standard noninformative priors for regression and scale parameters as well as proper prior for skewness parameter. Then, for Bayesian inference, a Markov chain Monte Carlo method is described. Since posterior results depend on the prior hyperparameters, we estimate them adopting the empirical Bayes method as well as using a Monte Carlo EM algorithm. Furthermore, to identify possible outliers, we also apply the Bayes factor obtained through the generalized Savage-Dickey density ratio. Examining the proposed approach on simulated instance and real data, it is found to provide not only satisfactory parameter estimates rather allow identifying outliers favorably.  相似文献   

5.
Several authors have discussed Kalman filtering procedures using a mixture of normals as a model for the distributions of the noise in the observation and/or the state space equations. Under this model, resulting posteriors involve a mixture of normal distributions, and a “collapsing method” must be found in order to keep the recursive procedure simple. We prove that the Kullback-Leibler distance between the mixture posterior and that of a single normal distribution is minimized when we choose the mean and variance of the single normal distribution to be the mean and variance of the mixture posterior. Hence, “collapsing by moments” is optimal in this sense. We then develop the resulting optimal algorithm for “Kalman filtering” for this situation, and illustrate its performance with an example.  相似文献   

6.
In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time.  相似文献   

7.
Clustering is a common and important issue, and finite mixture models based on the normal distribution are frequently used to address the problem. In this article, we consider a classification model and build a mixture model around it. A good assessment of the allocation of observations and number of clusters is easily obtained from this approach.  相似文献   

8.
The label-switching problem is one of the fundamental problems in Bayesian mixture analysis. Using all the Markov chain Monte Carlo samples as the initials for the expectation-maximization (EM) algorithm, we propose to label the samples based on the modes they converge to. Our method is based on the assumption that the samples converged to the same mode have the same labels. If a relative noninformative prior is used or the sample size is large, the posterior will be close to the likelihood and then the posterior modes can be located approximately by the EM algorithm for mixture likelihood, without assuming the availability of the closed form of the posterior. In order to speed up the computation of this labeling method, we also propose to first cluster the samples by K-means with a large number of clusters K. Then, by assuming that the samples within each cluster have the same labels, we only need to find one converged mode for each cluster. Using a Monte Carlo simulation study and a real dataset, we demonstrate the success of our new method in dealing with the label-switching problem.  相似文献   

9.
We develop clustering procedures for longitudinal trajectories based on a continuous-time hidden Markov model (CTHMM) and a generalized linear observation model. Specifically, in this article we carry out finite and infinite mixture model-based clustering for a CTHMM and achieve inference using Markov chain Monte Carlo (MCMC). For a finite mixture model with a prior on the number of components, we implement reversible-jump MCMC to facilitate the trans-dimensional move between models with different numbers of clusters. For a Dirichlet process mixture model, we utilize restricted Gibbs sampling split–merge proposals to improve the performance of the MCMC algorithm. We apply our proposed algorithms to simulated data as well as a real-data example, and the results demonstrate the desired performance of the new sampler.  相似文献   

10.
Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with asymmetric behavior. In this paper, we introduce a variable selection procedure for FMR models using the skew-normal distribution. With appropriate choice of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. To estimate the parameters of the model, a modified EM algorithm for numerical computations is developed. The methodology is illustrated through numerical experiments and a real data example.  相似文献   

11.
It is well known that parameter estimates and forecasts are sensitive to assumptions about the tail behavior of the error distribution. In this article, we develop an approach to sequential inference that also simultaneously estimates the tail of the accompanying error distribution. Our simulation-based approach models errors with a tν-distribution and, as new data arrives, we sequentially compute the marginal posterior distribution of the tail thickness. Our method naturally incorporates fat-tailed error distributions and can be extended to other data features such as stochastic volatility. We show that the sequential Bayes factor provides an optimal test of fat-tails versus normality. We provide an empirical and theoretical analysis of the rate of learning of tail thickness under a default Jeffreys prior. We illustrate our sequential methodology on the British pound/U.S. dollar daily exchange rate data and on data from the 2008–2009 credit crisis using daily S&P500 returns. Our method naturally extends to multivariate and dynamic panel data.  相似文献   

12.
We consider estimation of the unknown parameters of Chen distribution [Chen Z. A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function. Statist Probab Lett. 2000;49:155–161] with bathtub shape using progressive-censored samples. We obtain maximum likelihood estimates by making use of an expectation–maximization algorithm. Different Bayes estimates are derived under squared error and balanced squared error loss functions. It is observed that the associated posterior distribution appears in an intractable form. So we have used an approximation method to compute these estimates. A Metropolis–Hasting algorithm is also proposed and some more approximate Bayes estimates are obtained. Asymptotic confidence interval is constructed using observed Fisher information matrix. Bootstrap intervals are proposed as well. Sample generated from MH algorithm are further used in the construction of HPD intervals. Finally, we have obtained prediction intervals and estimates for future observations in one- and two-sample situations. A numerical study is conducted to compare the performance of proposed methods using simulations. Finally, we analyse real data sets for illustration purposes.  相似文献   

13.
A multivariate normal mean–variance mixture based on a Birnbaum–Saunders (NMVMBS) distribution is introduced and several properties of this new distribution are discussed. A new robust non-Gaussian ARCH-type model is proposed in which there exists a relation between the variance of the observations, and the marginal distributions are NMVMBS. A simple EM-based maximum likelihood estimation procedure to estimate the parameters of this normal mean–variance mixture distribution is given. A simulation study and some real data are used to demonstrate the modelling strength of this new model.  相似文献   

14.
Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data.The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist — actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors.A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations.Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive — surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden.The methodology is demonstrated through an illustrative example detailing relationships between a group of lawyers in the USA.  相似文献   

15.
Within the context of mixture modeling, the normal distribution is typically used as the components distribution. However, if a cluster is skewed or heavy tailed, then the normal distribution will be inefficient and many may be needed to model a single cluster. In this paper, we present an attempt to solve this problem. We define a cluster, in the absence of further information, to be a group of data which can be modeled by a unimodal density function. Hence, our intention is to use a family of univariate distribution functions, to replace the normal, for which the only constraint is unimodality. With this aim, we devise a new family of nonparametric unimodal distributions, which has large support over the space of univariate unimodal distributions. The difficult aspect of the Bayesian model is to construct a suitable MCMC algorithm to sample from the correct posterior distribution. The key will be the introduction of strategic latent variables and the use of the Product Space view of Reversible Jump methodology.  相似文献   

16.
Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.  相似文献   

17.
Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

18.
The k-means algorithm is one of the most common non hierarchical methods of clustering. It aims to construct clusters in order to minimize the within cluster sum of squared distances. However, as most estimators defined in terms of objective functions depending on global sums of squares, the k-means procedure is not robust with respect to atypical observations in the data. Alternative techniques have thus been introduced in the literature, e.g., the k-medoids method. The k-means and k-medoids methodologies are particular cases of the generalized k-means procedure. In this article, focus is on the error rate these clustering procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand. It is shown that contamination may make one of these two error rates decrease even under optimal models. The consequence of this will be emphasized with the comparison of influence functions and breakdown points of these error rates.  相似文献   

19.
A mixture of order statistics is a random variable whose distribution is a finite mixture of the distributions for order statistics. Such mixtures show up in the literature on ranked-set sampling and related sampling schemes as models for imperfect rankings. In this paper, we derive an algorithm for computing the probability that independent mixtures of order statistics come in a particular order. The algorithm is far faster than previous proposals from the literature. As an application, we show that the algorithm can be used to create Kolmogorov–Smirnov-type confidence bands that adjust for the presence of imperfect rankings.  相似文献   

20.
Robust mixture modelling using the t distribution   总被引:2,自引:0,他引:2  
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号