首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary.  As biological knowledge accumulates rapidly, gene networks encoding genomewide gene–gene interactions have been constructed. As an improvement over the standard mixture model that tests all the genes identically and independently distributed a priori , Wei and co-workers have proposed modelling a gene network as a discrete or Gaussian Markov random field (MRF) in a mixture model to analyse genomic data. However, how these methods compare in practical applications is not well understood and this is the aim here. We also propose two novel constraints in prior specifications for the Gaussian MRF model and a fully Bayesian approach to the discrete MRF model. We assess the accuracy of estimating the false discovery rate by posterior probabilities in the context of MRF models. Applications to a chromatin immuno-precipitation–chip data set and simulated data show that the modified Gaussian MRF models have superior performance compared with other models, and both MRF-based mixture models, with reasonable robustness to misspecified gene networks, outperform the standard mixture model.  相似文献   

2.
Identification of different gene expressions of chickpea (Cicer arietinum) plant tissue is needed in order to develop new varieties of chickpea plant which is resistant to disease through the insertion of genes. This plant is the third legume plant of the Leguminosae (Fabaceae) family and is much needed in the world due to its high-protein seeds and roots that contain symbiotic nitrogen-fixing bacteria. This paper has succeeded to demonstrate the work of Bayesian mixture model averaging (BMMA) approach to identify the different gene expressions of chickpea plant tissue in Indonesia. The results show that the best BMMA normal models contain from 727 (73%) up to 939 (94%) models from 1,000 generated mixture normal models. The fitted BMMA models to gene expression differences data on average is 0.2878511 for Kolmogorov–Smirnov (KS) and 0.1278080 for continuous rank probability score (CRPS). Based on these BMMA models, there are three groups of gene IDs: downregulated, regulated, and upregulated. The results of this grouping can be useful to find new varieties of chickpea plants that are more resistant to disease. The BMMA normal models coupled with Occam's window as a data-driven modeling have succeed to demonstrate the work of building the gene expression differences microarray experiments data.  相似文献   

3.
Modelling extreme wind speeds in regions prone to hurricanes   总被引:1,自引:0,他引:1  
Extreme wind speeds can arise as the result of a simple pressure differential, or a complex dynamic system such as a tropical storm. When sets of record values comprise a mixture of two or more different types of event, the standard models for extremes based on a single limiting distribution are not applicable. We develop a mixture model for extreme winds arising from two distinct processes. Working with sequences of annual maximum speeds obtained at hurricane prone locations in the USA, we take a Bayesian approach to inference, which allows the incorporation of prior information obtained from other sites. We model the extremal behaviour for the contrasting wind climates of Boston and Key West, and show that the standard models can give misleading results at such locations.  相似文献   

4.
Generalized linear mixed models are widely used for describing overdispersed and correlated data. Such data arise frequently in studies involving clustered and hierarchical designs. A more flexible class of models has been developed here through the Dirichlet process mixture. An additional advantage of using such mixture models is that the observations can be grouped together on the basis of the overdispersion present in the data. This paper proposes a partial empirical Bayes method for estimating all the model parameters by adopting a version of the EM algorithm. An augmented model that helps to implement an efficient Gibbs sampling scheme, under the non‐conjugate Dirichlet process generalized linear model, generates observations from the conditional predictive distribution of unobserved random effects and provides an estimate of the average number of mixing components in the Dirichlet process mixture. A simulation study has been carried out to demonstrate the consistency of the proposed method. The approach is also applied to a study on outdoor bacteria concentration in the air and to data from 14 retrospective lung‐cancer studies.  相似文献   

5.
Summary.  We discuss a method for combining different but related longitudinal studies to improve predictive precision. The motivation is to borrow strength across clinical studies in which the same measurements are collected at different frequencies. Key features of the data are heterogeneous populations and an unbalanced design across three studies of interest. The first two studies are phase I studies with very detailed observations on a relatively small number of patients. The third study is a large phase III study with over 1500 enrolled patients, but with relatively few measurements on each patient. Patients receive different doses of several drugs in the studies, with the phase III study containing significantly less toxic treatments. Thus, the main challenges for the analysis are to accommodate heterogeneous population distributions and to formalize borrowing strength across the studies and across the various treatment levels. We describe a hierarchical extension over suitable semiparametric longitudinal data models to achieve the inferential goal. A nonparametric random-effects model accommodates the heterogeneity of the population of patients. A hierarchical extension allows borrowing strength across different studies and different levels of treatment by introducing dependence across these nonparametric random-effects distributions. Dependence is introduced by building an analysis of variance (ANOVA) like structure over the random-effects distributions for different studies and treatment combinations. Model structure and parameter interpretation are similar to standard ANOVA models. Instead of the unknown normal means as in standard ANOVA models, however, the basic objects of inference are random distributions, namely the unknown population distributions under each study. The analysis is based on a mixture of Dirichlet processes model as the underlying semiparametric model.  相似文献   

6.
Bayesian methods have been extensively used in small area estimation. A linear model incorporating autocorrelated random effects and sampling errors was previously proposed in small area estimation using both cross-sectional and time-series data in the Bayesian paradigm. There are, however, many situations that we have time-related counts or proportions in small area estimation; for example, monthly dataset on the number of incidence in small areas. This article considers hierarchical Bayes generalized linear models for a unified analysis of both discrete and continuous data with incorporating cross-sectional and time-series data. The performance of the proposed approach is evaluated through several simulation studies and also by a real dataset.  相似文献   

7.
In this paper, we consider a new mixture of varying coefficient models, in which each mixture component follows a varying coefficient model and the mixing proportions and dispersion parameters are also allowed to be unknown smooth functions. We systematically study the identifiability, estimation and inference for the new mixture model. The proposed new mixture model is rather general, encompassing many mixture models as its special cases such as mixtures of linear regression models, mixtures of generalized linear models, mixtures of partially linear models and mixtures of generalized additive models, some of which are new mixture models by themselves and have not been investigated before. The new mixture of varying coefficient model is shown to be identifiable under mild conditions. We develop a local likelihood procedure and a modified expectation–maximization algorithm for the estimation of the unknown non‐parametric functions. Asymptotic normality is established for the proposed estimator. A generalized likelihood ratio test is further developed for testing whether some of the unknown functions are constants. We derive the asymptotic distribution of the proposed generalized likelihood ratio test statistics and prove that the Wilks phenomenon holds. The proposed methodology is illustrated by Monte Carlo simulations and an analysis of a CO2‐GDP data set.  相似文献   

8.
One of the fundamental issues in analyzing microarray data is to determine which genes are expressed and which ones are not for a given group of subjects. In datasets where many genes are expressed and many are not expressed (i.e., underexpressed), a bimodal distribution for the gene expression levels often results, where one mode of the distribution represents the expressed genes and the other mode represents the underexpressed genes. To model this bimodality, we propose a new class of mixture models that utilize a random threshold value for accommodating bimodality in the gene expression distribution. Theoretical properties of the proposed model are carefully examined. We use this new model to examine the problem of differential gene expression between two groups of subjects, develop prior distributions, and derive a new criterion for determining which genes are differentially expressed between the two groups. Prior elicitation is carried out using empirical Bayes methodology in order to estimate the threshold value as well as elicit the hyperparameters for the two component mixture model. The new gene selection criterion is demonstrated via several simulations to have excellent false positive rate and false negative rate properties. A gastric cancer dataset is used to motivate and illustrate the proposed methodology.  相似文献   

9.
In modern quality engineering, dual response surface methodology is a powerful tool to model an industrial process by using both the mean and the standard deviation of the measurements as the responses. The least squares method in regression is often used to estimate the coefficients in the mean and standard deviation models, and various decision criteria are proposed by researchers to find the optimal conditions. Based on the inherent hierarchical structure of the dual response problems, we propose a Bayesian hierarchical approach to model dual response surfaces. Such an approach is compared with two frequentist least squares methods by using two real data sets and simulated data.  相似文献   

10.
Clustered interval‐censored survival data are often encountered in clinical and epidemiological studies due to geographic exposures and periodic visits of patients. When a nonnegligible cured proportion exists in the population, several authors in recent years have proposed to use mixture cure models incorporating random effects or frailties to analyze such complex data. However, the implementation of the mixture cure modeling approaches may be cumbersome. Interest then lies in determining whether or not it is necessary to adjust the cured proportion prior to the mixture cure analysis. This paper mainly focuses on the development of a score for testing the presence of cured subjects in clustered and interval‐censored survival data. Through simulation, we evaluate the sampling distribution and power behaviour of the score test. A bootstrap approach is further developed, leading to more accurate significance levels and greater power in small sample situations. We illustrate applications of the test using data sets from a smoking cessation study and a retrospective study of early breast cancer patients.  相似文献   

11.
This paper proposes a hierarchical probabilistic model for ordinal matrix factorization. Unlike previous approaches, we model the ordinal nature of the data and take a principled approach to incorporating priors for the hidden variables. Two algorithms are presented for inference, one based on Gibbs sampling and one based on variational Bayes. Importantly, these algorithms may be implemented in the factorization of very large matrices with missing entries.  相似文献   

12.
This article focuses on the location, time, and spatio-temporal components associated with suitably aggregated data to improve prediction of individual asset values. Such effects are introduced in the context of hierarchical models, which we find more natural than attempting to model covariance structure. Indeed, our cross-sectional database, a sample of 7,936 transactions for 49 subdivisions over a 10-year period in Baton Rouge, Louisiana, precludes covariance modeling. A wide range of models arises, each fitted using sampling-based methods because likelihood-based fitting may not be possible. Choosing among an array of nonnested models is carried out using a posterior predictive criterion. In addition, one year of data is held out for model validation. A thorough analysis of the data incorporating all of the aforementioned issues is presented.  相似文献   

13.
Abstract.  The present work focuses on extensions of the posterior predictive p -value (ppp-value) for models with hierarchical structure, designed for testing assumptions made on underlying processes. The ppp-values are popular as tools for model criticism, yet their lack of a common interpretation limit their practical use. We discuss different extensions of ppp-values to hierarchical models, allowing for discrepancy measures that can be used for checking properties of the model at all stages. Through analytical derivations and simulation studies on simple models, we show that similar to the standard ppp-values, these extensions are typically far from uniformly distributed under the model assumptions and can give poor power in a hypothesis testing framework. We propose a calibration of the p -values, making the resulting calibrated p -values uniformly distributed under the model conditions. Illustrations are made through a real example of multinomial regression to age distributions of fish.  相似文献   

14.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

15.
The problem of modelling football data has become increasingly popular in the last few years and many different models have been proposed with the aim of estimating the characteristics that bring a team to lose or win a game, or to predict the score of a particular match. We propose a Bayesian hierarchical model to fulfil both these aims and test its predictive strength based on data about the Italian Serie A 1991–1992 championship. To overcome the issue of overshrinkage produced by the Bayesian hierarchical model, we specify a more complex mixture model that results in a better fit to the observed data. We test its performance using an example of the Italian Serie A 2007–2008 championship.  相似文献   

16.
Multivariate data with a sequential or temporal structure occur in various fields of study. The hidden Markov model (HMM) provides an attractive framework for modeling long-term persistence in areas of pattern recognition through the extension of independent and identically distributed mixture models. Unlike in typical mixture models, the heterogeneity of data is represented by hidden Markov states. This article extends the HMM to a multi-site or multivariate case by taking a hierarchical Bayesian approach. This extension has many advantages over a single-site HMM. For example, it can provide more information for identifying the structure of the HMM than a single-site analysis. We evaluate the proposed approach by exploiting a spatial correlation that depends on the distance between sites.  相似文献   

17.
Graphs and networks are common ways of depicting information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper, we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. We study a graph-constrained regularization procedure and its theoretical properties for regression analysis to take into account the neighborhood information of the variables measured on a graph, where a smoothness penalty on the coefficients is defined as a quadratic form of the Laplacian matrix associated with the graph. We establish estimation and model selection consistency results and provide estimation bounds for both fixed and diverging numbers of parameters in regression models. We demonstrate by simulations and a real dataset that the proposed procedure can lead to better variable selection and prediction than existing methods that ignore the graph information associated with the covariates.  相似文献   

18.
In studies that involve censored time-to-event data, stratification is frequently encountered due to different reasons, such as stratified sampling or model adjustment due to violation of model assumptions. Often, the main interest is not in the clustering variables, and the cluster-related parameters are treated as nuisance. When inference is about a parameter of interest in presence of many nuisance parameters, standard likelihood methods often perform very poorly and may lead to severe bias. This problem is particularly evident in models for clustered data with cluster-specific nuisance parameters, when the number of clusters is relatively high with respect to the within-cluster size. However, it is still unclear how the presence of censoring would affect this issue. We consider clustered failure time data with independent censoring, and propose frequentist inference based on an integrated likelihood. We then apply the proposed approach to a stratified Weibull model. Simulation studies show that appropriately defined integrated likelihoods provide very accurate inferential results in all circumstances, such as for highly clustered data or heavy censoring, even in extreme settings where standard likelihood procedures lead to strongly misleading results. We show that the proposed method performs generally as well as the frailty model, but it is superior when the frailty distribution is seriously misspecified. An application, which concerns treatments for a frequent disease in late-stage HIV-infected people, illustrates the proposed inferential method in Weibull regression models, and compares different inferential conclusions from alternative methods.  相似文献   

19.
ABSTRACT

This paper derives models to analyse Cannabis offences count series from New South Wales, Australia. The data display substantial overdispersion as well as underdispersion for a subset, trend movement and population heterogeneity. To describe the trend dynamic in the data, the Poisson geometric process model is first adopted and is extended to the generalized Poisson geometric process model to capture both over- and underdispersion. By further incorporating mixture effect, the model accommodates population heterogeneity and enables classification of homogeneous units. The model is implemented using Markov chain Monte Carlo algorithms via the user-friendly WinBUGS software and its performance is evaluated through a simulation study.  相似文献   

20.
Summary.  In microarray experiments, accurate estimation of the gene variance is a key step in the identification of differentially expressed genes. Variance models go from the too stringent homoscedastic assumption to the overparameterized model assuming a specific variance for each gene. Between these two extremes there is some room for intermediate models. We propose a method that identifies clusters of genes with equal variance. We use a mixture model on the gene variance distribution. A test statistic for ranking and detecting differentially expressed genes is proposed. The method is illustrated with publicly available complementary deoxyribonucleic acid microarray experiments, an unpublished data set and further simulation studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号