首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The development of new technologies to measure gene expression has been calling for statistical methods to integrate findings across multiple-platform studies. A common goal of microarray analysis is to identify genes with differential expression between two conditions, such as treatment versus control. Here, we introduce a hierarchical Bayesian meta-analysis model to pool gene expression studies from different microarray platforms: spotted DNA arrays and short oligonucleotide arrays. The studies have different array design layouts, each with multiple sources of data replication, including repeated experiments, slides and probes. Our model produces the gene-specific posterior probability of differential expression, which is the basis for inference. In simulations combining two and five independent studies, our meta-analysis model outperformed separate analyses for three commonly used comparison measures; it also showed improved receiver operating characteristic curves. When combining spotted DNA and CombiMatrix short oligonucleotide array studies of Geobacter sulfurreducens, our meta-analysis model discovered more genes for fixed thresholds of posterior probability of differential expression and Bayesian false discovery than individual study analyses. We also examine an alternative model and compare models using the deviance information criterion.  相似文献   

2.
Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example.  相似文献   

3.
Summary.  A typical microarray experiment attempts to ascertain which genes display differential expression in different samples. We model the data by using a two-component mixture model and develop an empirical Bayesian thresholding procedure, which was originally introduced for thresholding wavelet coefficients, as an alternative to the existing methods for determining differential expression across thousands of genes. The method is built on sound theoretical properties and has easy computer implementation in the R statistical package. Furthermore, we consider improvements to the standard empirical Bayesian procedure when replication is present, to increase the robustness and reliability of the method. We provide an introduction to microarrays for those who are unfamilar with the field and the proposed procedure is demonstrated with applications to two-channel complementary DNA microarray experiments.  相似文献   

4.
A Bayesian discovery procedure   总被引:1,自引:0,他引:1  
Summary.  We discuss a Bayesian discovery procedure for multiple-comparison problems. We show that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative. Under a semiparametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure, which was recently introduced by Storey. Improving the approximation leads us to a Bayesian discovery procedure, which exploits the multiple shrinkage in clusters that are implied by the assumed non-parametric model. We compare the Bayesian discovery procedure and the optimal discovery procedure estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumour samples. We extend the setting of the optimal discovery procedure by discussing modifications of the loss function that lead to different single-thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.  相似文献   

5.
Survival data involving silent events are often subject to interval censoring (the event is known to occur within a time interval) and classification errors if a test with no perfect sensitivity and specificity is applied. Considering the nature of this data plays an important role in estimating the time distribution until the occurrence of the event. In this context, we incorporate validation subsets into the parametric proportional hazard model, and show that this additional data, combined with Bayesian inference, compensate the lack of knowledge about test sensitivity and specificity improving the parameter estimates. The proposed model is evaluated through simulation studies, and Bayesian analysis is conducted within a Gibbs sampling procedure. The posterior estimates obtained under validation subset models present lower bias and standard deviation compared to the scenario with no validation subset or the model that assumes perfect sensitivity and specificity. Finally, we illustrate the usefulness of the new methodology with an analysis of real data about HIV acquisition in female sex workers that have been discussed in the literature.  相似文献   

6.
In recent years, Bayesian statistics methods in neuroscience have been showing important advances. In particular, detection of brain signals for studying the complexity of the brain is an active area of research. Functional magnetic resonance imagining (fMRI) is an important tool to determine which parts of the brain are activated by different types of physical behavior. According to recent results, there is evidence that the values of the connectivity brain signal parameters are close to zero and due to the nature of time series fMRI data with high-frequency behavior, Bayesian dynamic models for identifying sparsity are indeed far-reaching. We propose a multivariate Bayesian dynamic approach for model selection and shrinkage estimation of the connectivity parameters. We describe the coupling or lead-lag between any pair of regions by using mixture priors for the connectivity parameters and propose a new weakly informative default prior for the state variances. This framework produces one-step-ahead proper posterior predictive results and induces shrinkage and robustness suitable for fMRI data in the presence of sparsity. To explore the performance of the proposed methodology, we present simulation studies and an application to functional magnetic resonance imaging data.  相似文献   

7.
This paper presents a Bayesian-hypothesis-testing-based methodology for model validation and confidence extrapolation under uncertainty, using limited test data. An explicit expression of the Bayes factor is derived for the interval hypothesis testing. The interval method is compared with the Bayesian point null hypothesis testing approach. The Bayesian network with Markov Chain Monte Carlo simulation and Gibbs sampling is explored for extrapolating the inference from the validated domain at the component level to the untested domain at the system level. The effect of the number of experiments on the confidence in the model validation decision is investigated. The probabilities of Type I and Type II errors in decision-making during the model validation and confidence extrapolation are quantified. The proposed methodologies are applied to a structural mechanics problem. Numerical results demonstrate that the Bayesian methodology provides a quantitative approach to facilitate rational decisions in model validation and confidence extrapolation under uncertainty.  相似文献   

8.
Basket trials evaluate a single drug targeting a single genetic variant in multiple cancer cohorts. Empirical findings suggest that treatment efficacy across baskets may be heterogeneous. Most modern basket trial designs use Bayesian methods. These methods require the prior specification of at least one parameter that permits information sharing across baskets. In this study, we provide recommendations for selecting a prior for scale parameters for adaptive basket trials by using Bayesian hierarchical modeling. Heterogeneity among baskets attracts much attention in basket trial research, and substantial heterogeneity challenges the basic assumption of exchangeability of Bayesian hierarchical approach. Thus, we also allowed each stratum-specific parameter to be exchangeable or nonexchangeable with similar strata by using data observed in an interim analysis. Through a simulation study, we evaluated the overall performance of our design based on statistical power and type I error rates. Our research contributes to the understanding of the properties of Bayesian basket trial designs.  相似文献   

9.
We propose a fully Bayesian model with a non-informative prior for analyzing misclassified binary data with a validation substudy. In addition, we derive a closed-form algorithm for drawing all parameters from the posterior distribution and making statistical inference on odds ratios. Our algorithm draws each parameter from a beta distribution, avoids the specification of initial values, and does not have convergence issues. We apply the algorithm to a data set and compare the results with those obtained by other methods. Finally, the performance of our algorithm is assessed using simulation studies.  相似文献   

10.
We propose a Bayesian implementation of the lasso regression that accomplishes both shrinkage and variable selection. We focus on the appropriate specification for the shrinkage parameter λ through Bayes factors that evaluate the inclusion of each covariate in the model formulation. We associate this parameter with the values of Pearson and partial correlation at the limits between significance and insignificance as defined by Bayes factors. In this way, a meaningful interpretation of λ is achieved that leads to a simple specification of this parameter. Moreover, we use these values to specify the parameters of a gamma hyperprior for λ. The parameters of the hyperprior are elicited such that appropriate levels of practical significance of the Pearson correlation are achieved and, at the same time, the prior support of λ values that activate the Lindley-Bartlett paradox or lead to over-shrinkage of model coefficients is avoided. The proposed method is illustrated using two simulation studies and a real dataset. For the first simulation study, results for different prior values of λ are presented as well as a detailed robustness analysis concerning the parameters of the hyperprior of λ. In all examples, detailed comparisons with a variety of ordinary and Bayesian lasso methods are presented.  相似文献   

11.
We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to follow-up. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. We posit an exponential tilt model that links non-identifiable distributions and distributions identified under partial ignorability. This exponential tilt model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. To avoid the curse of dimensionality, we model the distribution of the observed data using a Bayesian shrinkage model. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial.  相似文献   

12.
Summary.  Precise classification of tumours is critical for the diagnosis and treatment of cancer. Diagnostic pathology has traditionally relied on macroscopic and microscopic histology and tumour morphology as the basis for the classification of tumours. Current classification frameworks, however, cannot discriminate between tumours with similar histopathologic features, which vary in clinical course and in response to treatment. In recent years, there has been a move towards the use of complementary deoxyribonucleic acid microarrays for the classi-fication of tumours. These high throughput assays provide relative messenger ribonucleic acid expression measurements simultaneously for thousands of genes. A key statistical task is to perform classification via different expression patterns. Gene expression profiles may offer more information than classical morphology and may provide an alternative to classical tumour diagnosis schemes. The paper considers several Bayesian classification methods based on reproducing kernel Hilbert spaces for the analysis of microarray data. We consider the logistic likelihood as well as likelihoods related to support vector machine models. It is shown through simulation and examples that support vector machine models with multiple shrinkage parameters produce fewer misclassification errors than several existing classical methods as well as Bayesian methods based on the logistic likelihood or those involving only one shrinkage parameter.  相似文献   

13.
Multivariate model validation is a complex decision-making problem involving comparison of multiple correlated quantities, based upon the available information and prior knowledge. This paper presents a Bayesian risk-based decision method for validation assessment of multivariate predictive models under uncertainty. A generalized likelihood ratio is derived as a quantitative validation metric based on Bayes’ theorem and Gaussian distribution assumption of errors between validation data and model prediction. The multivariate model is then assessed based on the comparison of the likelihood ratio with a Bayesian decision threshold, a function of the decision costs and prior of each hypothesis. The probability density function of the likelihood ratio is constructed using the statistics of multiple response quantities and Monte Carlo simulation. The proposed methodology is implemented in the validation of a transient heat conduction model, using a multivariate data set from experiments. The Bayesian methodology provides a quantitative approach to facilitate rational decisions in multivariate model assessment under uncertainty.  相似文献   

14.
15.
In most practical applications, the quality of count data is often compromised due to errors-in-variables (EIVs). In this paper, we apply Bayesian approach to reduce bias in estimating the parameters of count data regression models that have mismeasured independent variables. Furthermore, the exposure model is misspecified with a flexible distribution, hence our approach remains robust against any departures from normality in its true underlying exposure distribution. The proposed method is also useful in realistic situations as the variance of EIVs is estimated instead of assumed as known, in contrast with other methods of correcting bias especially in count data EIVs regression models. We conduct simulation studies on synthetic data sets using Markov chain Monte Carlo simulation techniques to investigate the performance of our approach. Our findings show that the flexible Bayesian approach is able to estimate the values of the true regression parameters consistently and accurately.  相似文献   

16.
Abstract

Handling data with the nonignorably missing mechanism is still a challenging problem in statistics. In this paper, we develop a fully Bayesian adaptive Lasso approach for quantile regression models with nonignorably missing response data, where the nonignorable missingness mechanism is specified by a logistic regression model. The proposed method extends the Bayesian Lasso by allowing different penalization parameters for different regression coefficients. Furthermore, a hybrid algorithm that combined the Gibbs sampler and Metropolis-Hastings algorithm is implemented to simulate the parameters from posterior distributions, mainly including regression coefficients, shrinkage coefficients, parameters in the non-ignorable missing models. Finally, some simulation studies and a real example are used to illustrate the proposed methodology.  相似文献   

17.
This paper studies penalized quantile regression for dynamic panel data with fixed effects, where the penalty involves l1 shrinkage of the fixed effects. Using extensive Monte Carlo simulations, we present evidence that the penalty term reduces the dynamic panel bias and increases the efficiency of the estimators. The underlying intuition is that there is no need to use instrumental variables for the lagged dependent variable in the dynamic panel data model without fixed effects. This provides an additional use for the shrinkage models, other than model selection and efficiency gains. We propose a Bayesian information criterion based estimator for the parameter that controls the degree of shrinkage. We illustrate the usefulness of the novel econometric technique by estimating a “target leverage” model that includes a speed of capital structure adjustment. Using the proposed penalized quantile regression model the estimates of the adjustment speeds lie between 3% and 44% across the quantiles, showing strong evidence that there is substantial heterogeneity in the speed of adjustment among firms.  相似文献   

18.
Quantitative model validation is playing an increasingly important role in performance and reliability assessment of a complex system whenever computer modelling and simulation are involved. The foci of this paper are to pursue a Bayesian probabilistic approach to quantitative model validation with non-normality data, considering data uncertainty and to investigate the impact of normality assumption on validation accuracy. The Box–Cox transformation method is employed to convert the non-normality data, with the purpose of facilitating the overall validation assessment of computational models with higher accuracy. Explicit expressions for the interval hypothesis testing-based Bayes factor are derived for the transformed data in the context of univariate and multivariate cases. Bayesian confidence measure is presented based on the Bayes factor metric. A generalized procedure is proposed to implement the proposed probabilistic methodology for model validation of complicated systems. Classic hypothesis testing method is employed to conduct a comparison study. The impact of data normality assumption and decision threshold variation on model assessment accuracy is investigated by using both classical and Bayesian approaches. The proposed methodology and procedure are demonstrated with a univariate stochastic damage accumulation model, a multivariate heat conduction problem and a multivariate dynamic system.  相似文献   

19.
Selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. In this paper, we propose a flexible rank-based nonparametric procedure for gene selection from microarray data. In the method we propose a statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is equal to 0.5 allowing different variance for each gene. The contribution to this “single gene” statistic is the studentization of the empirical AUC, which takes into account the variances associated with each gene in the experiment. Delong et al. proposed a nonparametric procedure for calculating a consistent variance estimator of the AUC. We use their variance estimation technique to get a test statistic, and we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. Two real datasets are analyzed to illustrate the methods and a simulation study is carried out to assess the relative performance of different statistical gene ranking measures. The work includes how to use the variance information to produce a list of significant targets and assess differential gene expressions under two conditions. The proposed method does not involve complicated formulas and does not require advanced programming skills. We conclude that the proposed methods offer useful analytical tools for identifying differentially expressed genes for further biological and clinical analysis.  相似文献   

20.
We develop a Bayesian estimation method to non-parametric mixed-effect models under shape-constrains. The approach uses a hierarchical Bayesian framework and characterizations of shape-constrained Bernstein polynomials (BPs). We employ Markov chain Monte Carlo methods for model fitting, using a truncated normal distribution as the prior for the coefficients of BPs to ensure the desired shape constraints. The small sample properties of the Bayesian shape-constrained estimators across a range of functions are provided via simulation studies. Two real data analysis are given to illustrate the application of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号