首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, mass spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a ‘two-step’ procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins is neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing ‘one-step’ procedures on a range of simulation settings as well as on two well-studied datasets.  相似文献   

2.
A Bayesian discovery procedure   总被引:1,自引:0,他引:1  
Summary.  We discuss a Bayesian discovery procedure for multiple-comparison problems. We show that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative. Under a semiparametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure, which was recently introduced by Storey. Improving the approximation leads us to a Bayesian discovery procedure, which exploits the multiple shrinkage in clusters that are implied by the assumed non-parametric model. We compare the Bayesian discovery procedure and the optimal discovery procedure estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumour samples. We extend the setting of the optimal discovery procedure by discussing modifications of the loss function that lead to different single-thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.  相似文献   

3.
Joinpoint regression model identifies significant changes in the trends of the incidence, mortality, and survival of a specific disease in a given population. The purpose of the present study is to develop an age-stratified Bayesian joinpoint regression model to describe mortality trend assuming that the observed counts are probabilistically characterized by the Poisson distribution. The proposed model is based on Bayesian model selection criteria with the smallest number of joinpoints that are sufficient to explain the Annual Percentage Change. The prior probability distributions are chosen in such a way that they are automatically derived from the model index contained in the model space. The proposed model and methodology estimates the age-adjusted mortality rates in different epidemiological studies to compare the trends by accounting the confounding effects of age. In developing the subject methods, we use the cancer mortality counts of adult lung and bronchus cancer, and brain and other Central Nervous System cancer patients obtained from the Surveillance Epidemiology and End Results data base of the National Cancer Institute.  相似文献   

4.
Due to the escalating growth of big data sets in recent years, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been developed. These methods partition large data sets by observations into subsets. However, for Bayesian nested hierarchical models, typically only a few parameters are common for the full data set, with most parameters being group specific. Thus, parallel Bayesian MCMC methods that take into account the structure of the model and split the full data set by groups rather than by observations are a more natural approach for analysis. Here, we adapt and extend a recently introduced two-stage Bayesian hierarchical modeling approach, and we partition complete data sets by groups. In stage 1, the group-specific parameters are estimated independently in parallel. The stage 1 posteriors are used as proposal distributions in stage 2, where the target distribution is the full model. Using three-level and four-level models, we show in both simulation and real data studies that results of our method agree closely with the full data analysis, with greatly increased MCMC efficiency and greatly reduced computation times. The advantages of our method versus existing parallel MCMC computing methods are also described.  相似文献   

5.
Summary.  A typical microarray experiment attempts to ascertain which genes display differential expression in different samples. We model the data by using a two-component mixture model and develop an empirical Bayesian thresholding procedure, which was originally introduced for thresholding wavelet coefficients, as an alternative to the existing methods for determining differential expression across thousands of genes. The method is built on sound theoretical properties and has easy computer implementation in the R statistical package. Furthermore, we consider improvements to the standard empirical Bayesian procedure when replication is present, to increase the robustness and reliability of the method. We provide an introduction to microarrays for those who are unfamilar with the field and the proposed procedure is demonstrated with applications to two-channel complementary DNA microarray experiments.  相似文献   

6.
In this study, an evaluation of Bayesian hierarchical models is made based on simulation scenarios to compare single-stage and multi-stage Bayesian estimations. Simulated datasets of lung cancer disease counts for men aged 65 and older across 44 wards in the London Health Authority were analysed using a range of spatially structured random effect components. The goals of this study are to determine which of these single-stage models perform best given a certain simulating model, how estimation methods (single- vs. multi-stage) compare in yielding posterior estimates of fixed effects in the presence of spatially structured random effects, and finally which of two spatial prior models – the Leroux or ICAR model, perform best in a multi-stage context under different assumptions concerning spatial correlation. Among the fitted single-stage models without covariates, we found that when there is low amount of variability in the distribution of disease counts, the BYM model is relatively robust to misspecification in terms of DIC, while the Leroux model is the least robust to misspecification. When these models were fit to data generated from models with covariates, we found that when there was one set of covariates – either spatially correlated or non-spatially correlated, changing the values of the fixed coefficients affected the ability of either the Leroux or ICAR model to fit the data well in terms of DIC. When there were multiple sets of spatially correlated covariates in the simulating model, however, we could not distinguish the goodness of fit to the data between these single-stage models. We found that the multi-stage modelling process via the Leroux and ICAR models generally reduced the variance of the posterior estimated fixed effects for data generated from models with covariates and a UH term compared to analogous single-stage models. Finally, we found the multi-stage Leroux model compares favourably to the multi-stage ICAR model in terms of DIC. We conclude that the mutli-stage Leroux model should be seriously considered in applications of Bayesian disease mapping when an investigator desires to fit a model with both fixed effects and spatially structured random effects to Poisson count data.  相似文献   

7.
In this paper we present Bayesian analysis of finite mixtures of multivariate Poisson distributions with an unknown number of components. The multivariate Poisson distribution can be regarded as the discrete counterpart of the multivariate normal distribution, which is suitable for modelling multivariate count data. Mixtures of multivariate Poisson distributions allow for overdispersion and for negative correlations between variables. To perform Bayesian analysis of these models we adopt a reversible jump Markov chain Monte Carlo (MCMC) algorithm with birth and death moves for updating the number of components. We present results obtained from applying our modelling approach to simulated and real data. Furthermore, we apply our approach to a problem in multivariate disease mapping, namely joint modelling of diseases with correlated counts.  相似文献   

8.
Gastric emptying studies are frequently used in medical research, both human and animal, when evaluating the effectiveness and determining the unintended side-effects of new and existing medications, diets, and procedures or interventions. It is essential that gastric emptying data be appropriately summarized before making comparisons between study groups of interest and to allow study the comparisons. Since gastric emptying data have a nonlinear emptying curve and are longitudinal data, nonlinear mixed effect (NLME) models can accommodate both the variation among measurements within individuals and the individual-to-individual variation. However, the NLME model requires strong assumptions that are often not satisfied in real applications that involve a relatively small number of subjects, have heterogeneous measurement errors, or have large variation among subjects. Therefore, we propose three semiparametric Bayesian NLMEs constructed with Dirichlet process priors, which automatically cluster sub-populations and estimate heterogeneous measurement errors. To compare three semiparametric models with the parametric model we propose a penalized posterior Bayes factor. We compare the performance of our semiparametric hierarchical Bayesian approaches with that of the parametric Bayesian hierarchical approach. Simulation results suggest that our semiparametric approaches are more robust and flexible. Our gastric emptying studies from equine medicine are used to demonstrate the advantage of our approaches.  相似文献   

9.
In recent years, there has been considerable interest in regression models based on zero-inflated distributions. These models are commonly encountered in many disciplines, such as medicine, public health, and environmental sciences, among others. The zero-inflated Poisson (ZIP) model has been typically considered for these types of problems. However, the ZIP model can fail if the non-zero counts are overdispersed in relation to the Poisson distribution, hence the zero-inflated negative binomial (ZINB) model may be more appropriate. In this paper, we present a Bayesian approach for fitting the ZINB regression model. This model considers that an observed zero may come from a point mass distribution at zero or from the negative binomial model. The likelihood function is utilized to compute not only some Bayesian model selection measures, but also to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. The approach can be easily implemented using standard Bayesian software, such as WinBUGS. The performance of the proposed method is evaluated with a simulation study. Further, a real data set is analyzed, where we show that ZINB regression models seems to fit the data better than the Poisson counterpart.  相似文献   

10.
Quantile regression (QR) allows one to model the effect of covariates across the entire response distribution, rather than only at the mean, but QR methods have been almost exclusively applied to continuous response variables and without considering spatial effects. Of the few studies that have performed QR on count data, none have included random spatial effects, which is an integral facet of the Bayesian spatial QR model for areal counts that we propose. Additionally, we introduce a simplifying alternative to the response variable transformation currently employed in the QR for counts literature. The efficacy of the proposed model is demonstrated via simulation study and on a real data application from the Texas Department of Family and Protective Services (TDFPS). Our model outperforms a comparable non-spatial model in both instances, as evidenced by the deviance information criterion (DIC) and coverage probabilities. With the TDFPS data, we identify one of four covariates, along with the intercept, as having a nonconstant effect across the response distribution.  相似文献   

11.
This paper describes a Bayesian approach to modelling carcinogenity in animal studies where the data consist of counts of the number of tumours present over time. It compares two autoregressive hidden Markov models. One of them models the transitions between three latent states: an inactive transient state, a multiplying state for increasing counts and a reducing state for decreasing counts. The second model introduces a fourth tied state to describe non‐zero observations that are neither increasing nor decreasing. Both these models can model the length of stay upon entry of a state. A discrete constant hazards waiting time distribution is used to model the time to onset of tumour growth. Our models describe between‐animal‐variability by a single hierarchy of random effects and the within‐animal variation by first‐order serial dependence. They can be extended to higher‐order serial dependence and multi‐level hierarchies. Analysis of data from animal experiments comparing the influence of two genes leads to conclusions that differ from those of Dunson (2000). The observed data likelihood defines an information criterion to assess the predictive properties of the three‐ and four‐state models. The deviance information criterion is appropriately defined for discrete parameters.  相似文献   

12.
In this paper, we develop a matching prior for the product of means in several normal distributions with unrestricted means and unknown variances. For this problem, properly assigning priors for the product of normal means has been issued because of the presence of nuisance parameters. Matching priors, which are priors matching the posterior probabilities of certain regions with their frequentist coverage probabilities, are commonly used but difficult to derive in this problem. We developed the first order probability matching priors for this problem; however, the developed matching priors are unproper. Thus, we apply an alternative method and derive a matching prior based on a modification of the profile likelihood. Simulation studies show that the derived matching prior performs better than the uniform prior and Jeffreys’ prior in meeting the target coverage probabilities, and meets well the target coverage probabilities even for the small sample sizes. In addition, to evaluate the validity of the proposed matching prior, Bayesian credible interval for the product of normal means using the matching prior is compared to Bayesian credible intervals using the uniform prior and Jeffrey’s prior, and the confidence interval using the method of Yfantis and Flatman.  相似文献   

13.
ABSTRACT

Given a sample from a finite population, we provide a nonparametric Bayesian prediction interval for a finite population mean when a standard normal assumption may be tenuous. We will do so using a Dirichlet process (DP), a nonparametric Bayesian procedure which is currently receiving much attention. An asymptotic Bayesian prediction interval is well known but it does not incorporate all the features of the DP. We show how to compute the exact prediction interval under the full Bayesian DP model. However, under the DP, when the population size is much larger than the sample size, the computational task becomes expensive. Therefore, for simplicity one might still want to consider useful and accurate approximations to the prediction interval. For this purpose, we provide a Bayesian procedure which approximates the distribution using the exchangeability property (correlation) of the DP together with normality. We compare the exact interval and our approximate interval with three standard intervals, namely the design-based interval under simple random sampling, an empirical Bayes interval and a moment-based interval which uses the mean and variance under the DP. However, these latter three intervals do not fully utilize the posterior distribution of the finite population mean under the DP. Using several numerical examples and a simulation study we show that our approximate Bayesian interval is a good competitor to the exact Bayesian interval for different combinations of sample sizes and population sizes.  相似文献   

14.
Bayesian methods have been extensively used in small area estimation. A linear model incorporating autocorrelated random effects and sampling errors was previously proposed in small area estimation using both cross-sectional and time-series data in the Bayesian paradigm. There are, however, many situations that we have time-related counts or proportions in small area estimation; for example, monthly dataset on the number of incidence in small areas. This article considers hierarchical Bayes generalized linear models for a unified analysis of both discrete and continuous data with incorporating cross-sectional and time-series data. The performance of the proposed approach is evaluated through several simulation studies and also by a real dataset.  相似文献   

15.
ABSTRACT

In this article, Bayesian estimation of the expected cell counts for log-linear models is considered. The prior specified for log-linear parameters is used to determine a prior for expected cell counts, by means of the family and parameters of prior distributions. This approach is more cost-effective than working directly with cell counts because converting prior information into a prior distribution on the log-linear parameters is easier than that of on the expected cell counts. While proceeding from the prior on log-linear parameters to the prior of the expected cell counts, we faced with a singularity problem of variance matrix of the prior distribution, and added a new precision parameter to solve the problem. A numerical example is also given to illustrate the usage of the new parameter.  相似文献   

16.
Summary.  We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set.  相似文献   

17.
In phase II single‐arm studies, the response rate of the experimental treatment is typically compared with a fixed target value that should ideally represent the true response rate for the standard of care therapy. Generally, this target value is estimated through previous data, but the inherent variability in the historical response rate is not taken into account. In this paper, we present a Bayesian procedure to construct single‐arm two‐stage designs that allows to incorporate uncertainty in the response rate of the standard treatment. In both stages, the sample size determination criterion is based on the concepts of conditional and predictive Bayesian power functions. Different kinds of prior distributions, which play different roles in the designs, are introduced, and some guidelines for their elicitation are described. Finally, some numerical results about the performance of the designs are provided and a real data example is illustrated. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

18.
Bayesian model building techniques are developed for data with a strong time series structure and possibly exogenous explanatory variables that have strong explanatory and predictive power. The emphasis is on finding whether there are any explanatory variables that might be used for modelling if the data have a strong time series structure that should also be included. We use a time series model that is linear in past observations and that can capture both stochastic and deterministic trend, seasonality and serial correlation. We propose the plotting of absolute predictive error against predictive standard deviation. A series of such plots is utilized to determine which of several nested and non-nested models is optimal in terms of minimizing the dispersion of the predictive distribution and restricting predictive outliers. We apply the techniques to modelling monthly counts of fatal road crashes in Australia where economic, consumption and weather variables are available and we find that three such variables should be included in addition to the time series filter. The approach leads to graphical techniques to determine strengths of relationships between the dependent variable and covariates and to detect model inadequacy as well as determining useful numerical summaries.  相似文献   

19.
Generalized linear models (GLMs) with error-in-covariates are useful in epidemiological research due to the ubiquity of non-normal response variables and inaccurate measurements. The link function in GLMs is chosen by the user depending on the type of response variable, frequently the canonical link function. When covariates are measured with error, incorrect inference can be made, compounded by incorrect choice of link function. In this article we propose three flexible approaches for handling error-in-covariates and estimating an unknown link simultaneously. The first approach uses a fully Bayesian (FB) hierarchical framework, treating the unobserved covariate as a latent variable to be integrated over. The second and third are approximate Bayesian approach which use a Laplace approximation to marginalize the variables measured with error out of the likelihood. Our simulation results show support that the FB approach is often a better choice than the approximate Bayesian approaches for adjusting for measurement error, particularly when the measurement error distribution is misspecified. These approaches are demonstrated on an application with binary response.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号