首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary.  The importance of incorporating existing biological knowledge, such as gene functional annotations in gene ontology, in analysing high throughput genomic and proteomic data is being increasingly recognized. In the context of detecting differential gene expression, however, the current practice of using gene annotations is limited primarily to validations. Here we take a direct approach to incorporating gene annotations into mixture models for analysis. First, in contrast with a standard mixture model assuming that each gene of the genome has the same distribution, we study stratified mixture models allowing genes with different annotations to have different distributions, such as prior probabilities. Second, rather than treating parameters in stratified mixture models independently, we propose a hierarchical model to take advantage of the hierarchical structure of most gene annotation systems, such as gene ontology. We consider a simplified implementation for the proof of concept. An application to a mouse microarray data set and a simulation study demonstrate the improvement of the two new approaches over the standard mixture model.  相似文献   

2.
We consider Dirichlet process mixture models in which the observed clusters in any particular dataset are not viewed as belonging to a finite set of possible clusters but rather as representatives of a latent structure in which objects belong to one of a potentially infinite number of clusters. As more information is revealed the number of inferred clusters is allowed to grow. The precision parameter of the Dirichlet process is a crucial parameter that controls the number of clusters. We develop a framework for the specification of the hyperparameters associated with the prior for the precision parameter that can be used both in the presence or absence of subjective prior information about the level of clustering. Our approach is illustrated in an analysis of clustering brands at the magazine Which?. The results are compared with the approach of Dorazio (2009) via a simulation study.  相似文献   

3.
Summary.  We discuss a method for combining different but related longitudinal studies to improve predictive precision. The motivation is to borrow strength across clinical studies in which the same measurements are collected at different frequencies. Key features of the data are heterogeneous populations and an unbalanced design across three studies of interest. The first two studies are phase I studies with very detailed observations on a relatively small number of patients. The third study is a large phase III study with over 1500 enrolled patients, but with relatively few measurements on each patient. Patients receive different doses of several drugs in the studies, with the phase III study containing significantly less toxic treatments. Thus, the main challenges for the analysis are to accommodate heterogeneous population distributions and to formalize borrowing strength across the studies and across the various treatment levels. We describe a hierarchical extension over suitable semiparametric longitudinal data models to achieve the inferential goal. A nonparametric random-effects model accommodates the heterogeneity of the population of patients. A hierarchical extension allows borrowing strength across different studies and different levels of treatment by introducing dependence across these nonparametric random-effects distributions. Dependence is introduced by building an analysis of variance (ANOVA) like structure over the random-effects distributions for different studies and treatment combinations. Model structure and parameter interpretation are similar to standard ANOVA models. Instead of the unknown normal means as in standard ANOVA models, however, the basic objects of inference are random distributions, namely the unknown population distributions under each study. The analysis is based on a mixture of Dirichlet processes model as the underlying semiparametric model.  相似文献   

4.
5.
This article designs a Sequential Monte Carlo (SMC) algorithm for estimation of Bayesian semi-parametric Stochastic Volatility model for financial data. In particular, it makes use of one of the most recent particle filters called Particle Learning (PL). SMC methods are especially well suited for state-space models and can be seen as a cost-efficient alternative to Markov Chain Monte Carlo (MCMC), since they allow for online type inference. The posterior distributions are updated as new data is observed, which is exceedingly costly using MCMC. Also, PL allows for consistent online model comparison using sequential predictive log Bayes factors. A simulated data is used in order to compare the posterior outputs for the PL and MCMC schemes, which are shown to be almost identical. Finally, a short real data application is included.  相似文献   

6.
A mixture model for random graphs   总被引:1,自引:0,他引:1  
The Erdös–Rényi model of a network is simple and possesses many explicit expressions for average and asymptotic properties, but it does not fit well to real-world networks. The vertices of those networks are often structured in unknown classes (functionally related proteins or social communities) with different connectivity properties. The stochastic block structures model was proposed for this purpose in the context of social sciences, using a Bayesian approach. We consider the same model in a frequentest statistical framework. We give the degree distribution and the clustering coefficient associated with this model, a variational method to estimate its parameters and a model selection criterion to select the number of classes. This estimation procedure allows us to deal with large networks containing thousands of vertices. The method is used to uncover the modular structure of a network of enzymatic reactions.  相似文献   

7.
In this paper, we propose a model with a Dirichlet process mixture of gamma densities in the bulk part below threshold and a generalized Pareto density in the tail for extreme value estimation. The proposed model is simple and flexible for posterior density estimation and posterior inference for high quantiles. The model works well even for small sample sizes and in the absence of prior information. We evaluate the performance of the proposed model through a simulation study. Finally, the proposed model is applied to a real environmental data.  相似文献   

8.
In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter αα and a base probability measure G0G0. In problems where αα is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for αα. In this paper an approach is developed for computing a prior for the precision parameter αα that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.  相似文献   

9.
When the target variable exhibits a semicontinuous behavior (a point mass in a single value and a continuous distribution elsewhere), parametric “two-part models” have been extensively used and investigated. The applications have mainly been related to non negative variables with a point mass in zero (zero-inflated data). In this article, a semiparametric Bayesian two-part model for dealing with such variables is proposed. The model allows a semiparametric expression for the two parts of the model by using Dirichlet processes. A motivating example, based on grape wine production in Tuscany (an Italian region), is used to show the capabilities of the model. Finally, two simulation experiments evaluate the model. Results show a satisfactory performance of the suggested approach for modeling and predicting semicontinuous data when parametric assumptions are not reasonable.  相似文献   

10.
Mixture models are used in a large number of applications yet there remain difficulties with maximum likelihood estimation. For instance, the likelihood surface for finite normal mixtures often has a large number of local maximizers, some of which do not give a good representation of the underlying features of the data. In this paper we present diagnostics that can be used to check the quality of an estimated mixture distribution. Particular attention is given to normal mixture models since they frequently arise in practice. We use the diagnostic tools for finite normal mixture problems and in the nonparametric setting where the difficult problem of determining a scale parameter for a normal mixture density estimate is considered. A large sample justification for the proposed methodology will be provided and we illustrate its implementation through several examples  相似文献   

11.
We propose a Bayesian nonparametric procedure for density estimation, for data in a closed, bounded interval, say [0,1]. To this aim, we use a prior based on Bemstein polynomials. This corresponds to expressing the density of the data as a mixture of given beta densities, with random weights and a random number of components. The density estimate is then obtained as the corresponding predictive density function. Comparison with classical and Bayesian kernel estimates is provided. The proposed procedure is illustrated in an example; an MCMC algorithm for approximating the estimate is also discussed.  相似文献   

12.
Summary.  In microarray experiments, accurate estimation of the gene variance is a key step in the identification of differentially expressed genes. Variance models go from the too stringent homoscedastic assumption to the overparameterized model assuming a specific variance for each gene. Between these two extremes there is some room for intermediate models. We propose a method that identifies clusters of genes with equal variance. We use a mixture model on the gene variance distribution. A test statistic for ranking and detecting differentially expressed genes is proposed. The method is illustrated with publicly available complementary deoxyribonucleic acid microarray experiments, an unpublished data set and further simulation studies.  相似文献   

13.
We introduce a new class of discrete random probability measures that extend the definition of Dirichlet process (DP) by explicitly incorporating skewness. The asymmetry is controlled by a single parameter in such a way that symmetric DPs are obtained as a special case of the general construction. We review the main properties of skewed DPs and develop appropriate Polya urn schemes. We illustrate the modelling in the context of linear regression models of the capital asset pricing model (CAPM) type, where assessing symmetry for the error distribution is important to check validity of the model.  相似文献   

14.
One of the fundamental issues in analyzing microarray data is to determine which genes are expressed and which ones are not for a given group of subjects. In datasets where many genes are expressed and many are not expressed (i.e., underexpressed), a bimodal distribution for the gene expression levels often results, where one mode of the distribution represents the expressed genes and the other mode represents the underexpressed genes. To model this bimodality, we propose a new class of mixture models that utilize a random threshold value for accommodating bimodality in the gene expression distribution. Theoretical properties of the proposed model are carefully examined. We use this new model to examine the problem of differential gene expression between two groups of subjects, develop prior distributions, and derive a new criterion for determining which genes are differentially expressed between the two groups. Prior elicitation is carried out using empirical Bayes methodology in order to estimate the threshold value as well as elicit the hyperparameters for the two component mixture model. The new gene selection criterion is demonstrated via several simulations to have excellent false positive rate and false negative rate properties. A gastric cancer dataset is used to motivate and illustrate the proposed methodology.  相似文献   

15.
In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies, and flexible modeling that allows for interactions between platforms and the estimated effect, as well as concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-study" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis, but also under a realistic alternative. The simulation results from the artificial data demonstrate the advantages of the Bayesian model. The 1 - AUC values for the Bayesian model are roughly half of the corresponding values for a direct combination of t- and SAM-statistics. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most noticeably, in small studies the Bayesian model generally outperforms other methods when evaluated by AUC, FDR, and MDR across a range of simulation parameters, and this difference diminishes for larger sample sizes in the individual studies. The split-study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform-, sample-, and annotation-differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies employing different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor positive tumors versus negative ones. Software and data for reproducing our analysis are publicly available.  相似文献   

16.
ABSTRACT

In this article, we propose a new distribution by mixing normal and Pareto distributions, and the new distribution provides an unusual hazard function. We model the mean and the variance with covariates for heterogeneity. Estimation of the parameters is obtained by the Bayesian method using Markov Chain Monte Carlo (MCMC) algorithms. Proposal distribution in MCMC is proposed with a defined working variable related to the observations. Through the simulation, the method shows a dependable performance of the model. We demonstrate through establishing model under a real dataset that the proposed model and method can be more suitable than the previous report.  相似文献   

17.
This paper presents a new Bayesian, infinite mixture model based, clustering approach, specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles, given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a non-parametric regression approach is used. Each expression profile is expanded over a set of basis functions and the sets of coefficients of each curve are subsequently modeled through a Bayesian infinite mixture of Gaussian distributions. Therefore, the task of finding clusters of genes with similar expression profiles is then reduced to the problem of grouping together genes whose coefficients are sampled from the same distribution in the mixture. Dirichlet processes prior is naturally employed in such kinds of models, since it allows one to deal automatically with the uncertainty about the number of clusters. The posterior inference is carried out by a split and merge MCMC sampling scheme which integrates out parameters of the component distributions and updates only the latent vector of the cluster membership. The final configuration is obtained via the maximum a posteriori estimator. The performance of the method is studied using synthetic and real microarray data and is compared with the performances of competitive techniques.  相似文献   

18.
A Bayesian nonparametric model for Taguchi's on-line quality monitoring procedure for attributes is introduced. The proposed model may accommodate the original single shift setting to the more realistic situation of gradual quality deterioration and allows the incorporation of an expert's opinion on the production process. Based on the number of inspections to be carried out until a defective item is found, the Bayesian operation for the distribution function that represents the increasing sequence of defective fractions during a cycle considering a mixture of Dirichlet processes as prior distribution is performed. Bayes estimates for relevant quantities are also obtained.  相似文献   

19.
Our article presents a general treatment of the linear regression model, in which the error distribution is modelled nonparametrically and the error variances may be heteroscedastic, thus eliminating the need to transform the dependent variable in many data sets. The mean and variance components of the model may be either parametric or nonparametric, with parsimony achieved through variable selection and model averaging. A Bayesian approach is used for inference with priors that are data-based so that estimation can be carried out automatically with minimal input by the user. A Dirichlet process mixture prior is used to model the error distribution nonparametrically; when there are no regressors in the model, the method reduces to Bayesian density estimation, and we show that in this case the estimator compares favourably with a well-regarded plug-in density estimator. We also consider a method for checking the fit of the full model. The methodology is applied to a number of simulated and real examples and is shown to work well.  相似文献   

20.
Many existing approaches to analysing interval-censored data lack flexibility or efficiency. In this paper, we propose an efficient, easy to implement approach on accelerated failure time model with a logarithm transformation of the failure time and flexible specifications on the error distribution. We use exact inference for the Dirichlet process without approximation in imputation. Our algorithm can be implemented with simple Gibbs sampling which produces exact posterior distributions on the features of interest. Simulation and real data analysis demonstrate the advantage of our method compared to some other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号