首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper describes a Bayesian approach to modelling carcinogenity in animal studies where the data consist of counts of the number of tumours present over time. It compares two autoregressive hidden Markov models. One of them models the transitions between three latent states: an inactive transient state, a multiplying state for increasing counts and a reducing state for decreasing counts. The second model introduces a fourth tied state to describe non‐zero observations that are neither increasing nor decreasing. Both these models can model the length of stay upon entry of a state. A discrete constant hazards waiting time distribution is used to model the time to onset of tumour growth. Our models describe between‐animal‐variability by a single hierarchy of random effects and the within‐animal variation by first‐order serial dependence. They can be extended to higher‐order serial dependence and multi‐level hierarchies. Analysis of data from animal experiments comparing the influence of two genes leads to conclusions that differ from those of Dunson (2000). The observed data likelihood defines an information criterion to assess the predictive properties of the three‐ and four‐state models. The deviance information criterion is appropriately defined for discrete parameters.  相似文献   

2.
The hidden Markov model (HMM) provides an attractive framework for modeling long-term persistence in a variety of applications including pattern recognition. Unlike typical mixture models, hidden Markov states can represent the heterogeneity in data and it can be extended to a multivariate case using a hierarchical Bayesian approach. This article provides a nonparametric Bayesian modeling approach to the multi-site HMM by considering stick-breaking priors for each row of an infinite state transition matrix. This extension has many advantages over a parametric HMM. For example, it can provide more flexible information for identifying the structure of the HMM than parametric HMM analysis, such as the number of states in HMM. We exploit a simulation example and a real dataset to evaluate the proposed approach.  相似文献   

3.
Variational Bayes (VB) estimation is a fast alternative to Markov Chain Monte Carlo for performing approximate Baesian inference. This procedure can be an efficient and effective means of analyzing large datasets. However, VB estimation is often criticised, typically on empirical grounds, for being unable to produce valid statistical inferences. In this article we refute this criticism for one of the simplest models where Bayesian inference is not analytically tractable, that is, the Bayesian linear model (for a particular choice of priors). We prove that under mild regularity conditions, VB based estimators enjoy some desirable frequentist properties such as consistency and can be used to obtain asymptotically valid standard errors. In addition to these results we introduce two VB information criteria: the variational Akaike information criterion and the variational Bayesian information criterion. We show that variational Akaike information criterion is asymptotically equivalent to the frequentist Akaike information criterion and that the variational Bayesian information criterion is first order equivalent to the Bayesian information criterion in linear regression. These results motivate the potential use of the variational information criteria for more complex models. We support our theoretical results with numerical examples.  相似文献   

4.
Bayesian measures of model complexity and fit   总被引:7,自引:0,他引:7  
Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure p D for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general p D approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the 'hat' matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding p D to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.  相似文献   

5.
Copula, marginal distributions and model selection: a Bayesian note   总被引:3,自引:0,他引:3  
Copula functions and marginal distributions are combined to produce multivariate distributions. We show advantages of estimating all parameters of these models using the Bayesian approach, which can be done with standard Markov chain Monte Carlo algorithms. Deviance-based model selection criteria are also discussed when applied to copula models since they are invariant under monotone increasing transformations of the marginals. We focus on the deviance information criterion. The joint estimation takes into account all dependence structure of the parameters’ posterior distributions in our chosen model selection criteria. Two Monte Carlo studies are conducted to show that model identification improves when the model parameters are jointly estimated. We study the Bayesian estimation of all unknown quantities at once considering bivariate copula functions and three known marginal distributions.  相似文献   

6.
Monte Carlo experiments are conducted to compare the Bayesian and sample theory model selection criteria in choosing the univariate probit and logit models. We use five criteria: the deviance information criterion (DIC), predictive deviance information criterion (PDIC), Akaike information criterion (AIC), weighted, and unweighted sums of squared errors. The first two criteria are Bayesian while the others are sample theory criteria. The results show that if data are balanced none of the model selection criteria considered in this article can distinguish the probit and logit models. If data are unbalanced and the sample size is large the DIC and AIC choose the correct models better than the other criteria. We show that if unbalanced binary data are generated by a leptokurtic distribution the logit model is preferred over the probit model. The probit model is preferred if unbalanced data are generated by a platykurtic distribution. We apply the model selection criteria to the probit and logit models that link the ups and downs of the returns on S&P500 to the crude oil price.  相似文献   

7.
Hidden Markov random field models provide an appealing representation of images and other spatial problems. The drawback is that inference is not straightforward for these models as the normalisation constant for the likelihood is generally intractable except for very small observation sets. Variational methods are an emerging tool for Bayesian inference and they have already been successfully applied in other contexts. Focusing on the particular case of a hidden Potts model with Gaussian noise, we show how variational Bayesian methods can be applied to hidden Markov random field inference. To tackle the obstacle of the intractable normalising constant for the likelihood, we explore alternative estimation approaches for incorporation into the variational Bayes algorithm. We consider a pseudo-likelihood approach as well as the more recent reduced dependence approximation of the normalisation constant. To illustrate the effectiveness of these approaches we present empirical results from the analysis of simulated datasets. We also analyse a real dataset and compare results with those of previous analyses as well as those obtained from the recently developed auxiliary variable MCMC method and the recursive MCMC method. Our results show that the variational Bayesian analyses can be carried out much faster than the MCMC analyses and produce good estimates of model parameters. We also found that the reduced dependence approximation of the normalisation constant outperformed the pseudo-likelihood approximation in our analysis of real and synthetic datasets.  相似文献   

8.
Multivariate data with a sequential or temporal structure occur in various fields of study. The hidden Markov model (HMM) provides an attractive framework for modeling long-term persistence in areas of pattern recognition through the extension of independent and identically distributed mixture models. Unlike in typical mixture models, the heterogeneity of data is represented by hidden Markov states. This article extends the HMM to a multi-site or multivariate case by taking a hierarchical Bayesian approach. This extension has many advantages over a single-site HMM. For example, it can provide more information for identifying the structure of the HMM than a single-site analysis. We evaluate the proposed approach by exploiting a spatial correlation that depends on the distance between sites.  相似文献   

9.
Biclustering is the simultaneous clustering of two related dimensions, for example, of individuals and features, or genes and experimental conditions. Very few statistical models for biclustering have been proposed in the literature. Instead, most of the research has focused on algorithms to find biclusters. The models underlying them have not received much attention. Hence, very little is known about the adequacy and limitations of the models and the efficiency of the algorithms. In this work, we shed light on associated statistical models behind the algorithms. This allows us to generalize most of the known popular biclustering techniques, and to justify, and many times improve on, the algorithms used to find the biclusters. It turns out that most of the known techniques have a hidden Bayesian flavor. Therefore, we adopt a Bayesian framework to model biclustering. We propose a measure of biclustering complexity (number of biclusters and overlapping) through a penalized plaid model, and present a suitable version of the deviance information criterion to choose the number of biclusters, a problem that has not been adequately addressed yet. Our ideas are motivated by the analysis of gene expression data.  相似文献   

10.
11.
Hidden Markov models (HMMs) have been shown to be a flexible tool for modelling complex biological processes. However, choosing the number of hidden states remains an open question and the inclusion of random effects also deserves more research, as it is a recent addition to the fixed-effect HMM in many application fields. We present a Bayesian mixed HMM with an unknown number of hidden states and fixed covariates. The model is fitted using reversible-jump Markov chain Monte Carlo, avoiding the need to select the number of hidden states. We show through simulations that the estimations produced are more precise than those from a fixed-effect HMM and illustrate its practical application to the analysis of DNA copy number data, a field where HMMs are widely used.  相似文献   

12.
We introduce a multivariate heteroscedastic measurement error model for replications under scale mixtures of normal distribution. The model can provide a robust analysis and can be viewed as a generalization of multiple linear regression from both model structure and distribution assumption. An efficient method based on Markov Chain Monte Carlo is developed for parameter estimation. The deviance information criterion and the conditional predictive ordinates are used as model selection criteria. Simulation studies show robust inference behaviours of the model against both misspecification of distributions and outliers. We work out an illustrative example with a real data set on measurements of plant root decomposition.  相似文献   

13.
On the use of corrections for overdispersion   总被引:3,自引:0,他引:3  
In studying fluctuations in the size of a blackgrouse ( Tetrao tetrix ) population, an autoregressive model using climatic conditions appears to follow the change quite well. However, the deviance of the model is considerably larger than its number of degrees of freedom. A widely used statistical rule of thumb holds that overdispersion is present in such situations, but model selection based on a direct likelihood approach can produce opposing results. Two further examples, of binomial and of Poisson data, have models with deviances that are almost twice the degrees of freedom and yet various overdispersion models do not fit better than the standard model for independent data. This can arise because the rule of thumb only considers a point estimate of dispersion, without regard for any measure of its precision. A reasonable criterion for detecting overdispersion is that the deviance be at least twice the number of degrees of freedom, the familiar Akaike information criterion, but the actual presence of overdispersion should then be checked by some appropriate modelling procedure.  相似文献   

14.

Structural change in any time series is practically unavoidable, and thus correctly detecting breakpoints plays a pivotal role in statistical modelling. This research considers segmented autoregressive models with exogenous variables and asymmetric GARCH errors, GJR-GARCH and exponential-GARCH specifications, which utilize the leverage phenomenon to demonstrate asymmetry in response to positive and negative shocks. The proposed models incorporate skew Student-t distribution and prove the advantages of the fat-tailed skew Student-t distribution versus other distributions when structural changes appear in financial time series. We employ Bayesian Markov Chain Monte Carlo methods in order to make inferences about the locations of structural change points and model parameters and utilize deviance information criterion to determine the optimal number of breakpoints via a sequential approach. Our models can accurately detect the number and locations of structural change points in simulation studies. For real data analysis, we examine the impacts of daily gold returns and VIX on S&P 500 returns during 2007–2019. The proposed methods are able to integrate structural changes through the model parameters and to capture the variability of a financial market more efficiently.

  相似文献   

15.
Nowadays, Bayesian methods are routinely used for estimating parameters of item response theory (IRT) models. However, the marginal likelihoods are still rarely used for comparing IRT models due to their complexity and a relatively high dimension of the model parameters. In this paper, we review Monte Carlo (MC) methods developed in the literature in recent years and provide a detailed development of how these methods are applied to the IRT models. In particular, we focus on the “best possible” implementation of these MC methods for the IRT models. These MC methods are used to compute the marginal likelihoods under the one-parameter IRT model with the logistic link (1PL model) and the two-parameter logistic IRT model (2PL model) for a real English Examination dataset. We further use the widely applicable information criterion (WAIC) and deviance information criterion (DIC) to compare the 1PL model and the 2PL model. The 2PL model is favored by all of these three Bayesian model comparison criteria for the English Examination data.  相似文献   

16.
In this article, we propose a new empirical information criterion (EIC) for model selection which penalizes the likelihood of the data by a non-linear function of the number of parameters in the model. It is designed to be used where there are a large number of time series to be forecast. However, a bootstrap version of the EIC can be used where there is a single time series to be forecast. The EIC provides a data-driven model selection tool that can be tuned to the particular forecasting task.

We compare the EIC with other model selection criteria including Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC). The comparisons show that for the M3 forecasting competition data, the EIC outperforms both the AIC and BIC, particularly for longer forecast horizons. We also compare the criteria on simulated data and find that the EIC does better than existing criteria in that case also.  相似文献   

17.
We compare Bayesian and sample theory model specification criteria. For the Bayesian criteria we use the deviance information criterion and the cumulative density of the mean squared errors of forecast. For the sample theory criterion we use the conditional Kolmogorov test. We use Markov chain Monte Carlo methods to obtain the Bayesian criteria and bootstrap sampling to obtain the conditional Kolmogorov test. Two non nested models we consider are the CIR and Vasicek models for spot asset prices. Monte Carlo experiments show that the DIC performs better than the cumulative density of the mean squared errors of forecast and the CKT. According to the DIC and the mean squared errors of forecast, the CIR model explains the daily data on uncollateralized Japanese call rate from January 1, 1990 to April 18, 1996; but according to the CKT, neither the CIR nor Vasicek models explains the daily data.  相似文献   

18.
When the unobservable Markov chain in a hidden Markov model is stationary the marginal distribution of the observations is a finite mixture with the number of terms equal to the number of the states of the Markov chain. This suggests the number of states of the unobservable Markov chain can be estimated by determining the number of mixture components in the marginal distribution. This paper presents new methods for estimating the number of states in a hidden Markov model, and coincidentally the unknown number of components in a finite mixture, based on penalized quasi‐likelihood and generalized quasi‐likelihood ratio methods constructed from the marginal distribution. The procedures advocated are simple to calculate, and results obtained in empirical applications indicate that they are as effective as current available methods based on the full likelihood. Under fairly general regularity conditions, the methods proposed generate strongly consistent estimates of the unknown number of states or components.  相似文献   

19.
Summary. We describe a model-based approach to analyse space–time surveillance data on meningococcal disease. Such data typically comprise a number of time series of disease counts, each representing a specific geographical area. We propose a hierarchical formulation, where latent parameters capture temporal, seasonal and spatial trends in disease incidence. We then add—for each area—a hidden Markov model to describe potential additional (autoregressive) effects of the number of cases at the previous time point. Different specifications for the functional form of this autoregressive term are compared which involve the number of cases in the same or in neighbouring areas. The two states of the Markov chain can be interpreted as representing an 'endemic' and a 'hyperendemic' state. The methodology is applied to a data set of monthly counts of the incidence of meningococcal disease in the 94 départements of France from 1985 to 1997. Inference is carried out by using Markov chain Monte Carlo simulation techniques in a fully Bayesian framework. We emphasize that a central feature of our model is the possibility of calculating—for each region and each time point—the posterior probability of being in a hyperendemic state, adjusted for global spatial and temporal trends, which we believe is of particular public health interest.  相似文献   

20.
Summary.  We present a Bayesian evidence synthesis model combining data on seroprevalence, seroconversion and tests of recent infection, to produce estimates of current incidence of toxoplasmosis in the UK. The motivation for the study was the need for an estimate of current average incidence in the UK, with a realistic assessment of its uncertainty, to inform a decision model for a national screening programme to prevent congenital toxoplasmosis. The model has a hierarchical structure over geographic region, a random-walk model for temporal effects and a fixed age effect, with one or more types of data informing the regional estimates of incidence. Inference is obtained by using Markov chain Monte Carlo simulations. A key issue in the synthesis of evidence from multiple sources is model selection and the consistency of different types of evidence. Alternative models of incidence are compared by using the deviance information criterion, and we find that temporal effects are region specific. We assess the consistency of the various forms of evidence by using cross-validation where practical, and posterior and mixed prediction otherwise, and we discuss how these measures can be used to assess different aspects of consistency in a complex evidence structure. We discuss the contribution of the various forms of evidence to estimated current average incidence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号