首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Bayesian calibration of computer models   总被引:5,自引:0,他引:5  
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.  相似文献   

2.
We consider fitting Emax models to the primary endpoint for a parallel group dose–response clinical trial. Such models can be difficult to fit using Maximum Likelihood if the data give little information about the maximum possible response. Consequently, we consider alternative models that can be derived as limiting cases, which can usually be fitted. Furthermore we propose two model selection procedures for choosing between the different models. These model selection procedures are compared with two model selection procedures which have previously been used. In a simulation study we find that the model selection procedure that performs best depends on the underlying true situation. One of the new model selection procedures gives what may be regarded as the most robust of the procedures.  相似文献   

3.
The simplification of complex models which were originally envisaged to explain some data is considered as a discrete form of smoothing. In this sense data based model selection techniques lead to minimal and unavoidable initial smoothing. The same techniques may also be used for further smoothing if this seems necessary. For deterministic data parametric models which are usually used for stochastic data also provide convenient notches in the process of smoothing. The usual discrepancies can be used to measure the degree of smoothing. The methods for tables of means and tables of frequencies are described in more detail and examples of applications are given.  相似文献   

4.
Abstract

We construct a new bivariate mixture of negative binomial distributions which represents over-dispersed data more efficiently. This is an extension of a univariate mixture of beta and negative binomial distributions. Characteristics of this joint distribution are studied including conditional distributions. Some properties of the correlation coefficient are explored. We demonstrate the applicability of our proposed model by fitting to three real data sets with correlated count data. A comparison is made with some previously used models to show the effectiveness of the new model.  相似文献   

5.
The Integrated Nested Laplace Approximation (INLA) has established itself as a widely used method for approximate inference on Bayesian hierarchical models which can be represented as a latent Gaussian model (LGM). INLA is based on producing an accurate approximation to the posterior marginal distributions of the parameters in the model and some other quantities of interest by using repeated approximations to intermediate distributions and integrals that appear in the computation of the posterior marginals. INLA focuses on models whose latent effects are a Gaussian Markov random field. For this reason, we have explored alternative ways of expanding the number of possible models that can be fitted using the INLA methodology. In this paper, we present a novel approach that combines INLA and Markov chain Monte Carlo (MCMC). The aim is to consider a wider range of models that can be fitted with INLA only when some of the parameters of the model have been fixed. We show how new values of these parameters can be drawn from their posterior by using conditional models fitted with INLA and standard MCMC algorithms, such as Metropolis–Hastings. Hence, this will extend the use of INLA to fit models that can be expressed as a conditional LGM. Also, this new approach can be used to build simpler MCMC samplers for complex models as it allows sampling only on a limited number of parameters in the model. We will demonstrate how our approach can extend the class of models that could benefit from INLA, and how the R-INLA package will ease its implementation. We will go through simple examples of this new approach before we discuss more advanced applications with datasets taken from the relevant literature. In particular, INLA within MCMC will be used to fit models with Laplace priors in a Bayesian Lasso model, imputation of missing covariates in linear models, fitting spatial econometrics models with complex nonlinear terms in the linear predictor and classification of data with mixture models. Furthermore, in some of the examples we could exploit INLA within MCMC to make joint inference on an ensemble of model parameters.  相似文献   

6.
With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.  相似文献   

7.
Multilevel modelling of the geographical distributions of diseases   总被引:4,自引:0,他引:4  
Multilevel modelling is used on problems arising from the analysis of spatially distributed health data. We use three applications to demonstrate the use of multilevel modelling in this area. The first concerns small area all-cause mortality rates from Glasgow where spatial autocorrelation between residuals is examined. The second analysis is of prostate cancer cases in Scottish counties where we use a range of models to examine whether the incidence is higher in more rural areas. The third develops a multiple-cause model in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model. We discuss some of the issues surrounding the use of complex spatial models and the potential for future developments.  相似文献   

8.
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.  相似文献   

9.
The results of analyzing experimental data using a parametric model may heavily depend on the chosen model for regression and variance functions, moreover also on a possibly underlying preliminary transformation of the variables. In this paper we propose and discuss a complex procedure which consists in a simultaneous selection of parametric regression and variance models from a relatively rich model class and of Box-Cox variable transformations by minimization of a cross-validation criterion. For this it is essential to introduce modifications of the standard cross-validation criterion adapted to each of the following objectives: 1. estimation of the unknown regression function, 2. prediction of future values of the response variable, 3. calibration or 4. estimation of some parameter with a certain meaning in the corresponding field of application. Our idea of a criterion oriented combination of procedures (which usually if applied, then in an independent or sequential way) is expected to lead to more accurate results. We show how the accuracy of the parameter estimators can be assessed by a “moment oriented bootstrap procedure", which is an essential modification of the “wild bootstrap” of Härdle and Mammen by use of more accurate variance estimates. This new procedure and its refinement by a bootstrap based pivot (“double bootstrap”) is also used for the construction of confidence, prediction and calibration intervals. Programs written in Splus which realize our strategy for nonlinear regression modelling and parameter estimation are described as well. The performance of the selected model is discussed, and the behaviour of the procedures is illustrated, e.g., by an application in radioimmunological assay.  相似文献   

10.
The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies.  相似文献   

11.
Bayesian models for relative archaeological chronology building   总被引:1,自引:0,他引:1  
For many years, archaeologists have postulated that the numbers of various artefact types found within excavated features should give insight about their relative dates of deposition even when stratigraphic information is not present. A typical data set used in such studies can be reported as a cross-classification table (often called an abundance matrix or, equivalently, a contingency table) of excavated features against artefact types. Each entry of the table represents the number of a particular artefact type found in a particular archaeological feature. Methodologies for attempting to identify temporal sequences on the basis of such data are commonly referred to as seriation techniques. Several different procedures for seriation including both parametric and non-parametric statistics have been used in an attempt to reconstruct relative chronological orders on the basis of such contingency tables. We develop some possible model-based approaches that might be used to aid in relative, archaeological chronology building. We use the recently developed Markov chain Monte Carlo method based on Langevin diffusions to fit some of the models proposed. Predictive Bayesian model choice techniques are then employed to ascertain which of the models that we develop are most plausible. We analyse two data sets taken from the literature on archaeological seriation.  相似文献   

12.
Stepwise variable selection procedures are computationally inexpensive methods for constructing useful regression models for a single dependent variable. At each step a variable is entered into or deleted from the current model, based on the criterion of minimizing the error sum of squares (SSE). When there is more than one dependent variable, the situation is more complex. In this article we propose variable selection criteria for multivariate regression which generalize the univariate SSE criterion. Specifically, we suggest minimizing some function of the estimated error covariance matrix: the trace, the determinant, or the largest eigenvalue. The computations associated with these criteria may be burdensome. We develop a computational framework based on the use of the SWEEP operator which greatly reduces these calculations for stepwise variable selection in multivariate regression.  相似文献   

13.
We investigate the impacts of complex sampling on point and standard error estimates in latent growth curve modelling of survey data. Methodological issues are illustrated with empirical evidence from the analysis of longitudinal data on life satisfaction trajectories using data from the British Household Panel Survey, a national representative survey in Great Britain. A multi-process second-order latent growth curve model with conditional linear growth is used to study variation in the two perceived life satisfaction latent factors considered. The benefits of accounting for the complex survey design are considered, including obtaining unbiased both point and standard error estimates, and therefore correctly specified confidence intervals and statistical tests. We conclude that, even for the rather elaborated longitudinal data models that were considered, estimation procedures are affected by variance-inflating impacts of complex sampling.  相似文献   

14.
The number of variables in a regression model is often too large and a more parsimonious model may be preferred. Selection strategies (e.g. all-subset selection with various penalties for model complexity, or stepwise procedures) are widely used, but there are few analytical results about their properties. The problems of replication stability, model complexity, selection bias and an over-optimistic estimate of the predictive value of a model are discussed together with several proposals based on resampling methods. The methods are applied to data from a case–control study on atopic dermatitis and a clinical trial to compare two chemotherapy regimes by using a logistic regression and a Cox model. A recent proposal to use shrinkage factors to reduce the bias of parameter estimates caused by model building is extended to parameterwise shrinkage factors and is discussed as a further possibility to illustrate problems of models which are too complex. The results from the resampling approaches favour greater simplicity of the final regression model.  相似文献   

15.
Classical inferential procedures induce conclusions from a set of data to a population of interest, accounting for the imprecision resulting from the stochastic component of the model. Less attention is devoted to the uncertainty arising from (unplanned) incompleteness in the data. Through the choice of an identifiable model for non-ignorable non-response, one narrows the possible data-generating mechanisms to the point where inference only suffers from imprecision. Some proposals have been made for assessing the sensitivity to these modelling assumptions; many are based on fitting several plausible but competing models. For example, we could assume that the missing data are missing at random in one model, and then fit an additional model where non-random missingness is assumed. On the basis of data from a Slovenian plebiscite, conducted in 1991, to prepare for independence, it is shown that such an ad hoc procedure may be misleading. We propose an approach which identifies and incorporates both sources of uncertainty in inference: imprecision due to finite sampling and ignorance due to incompleteness. A simple sensitivity analysis considers a finite set of plausible models. We take this idea one step further by considering more degrees of freedom than the data support. This produces sets of estimates (regions of ignorance) and sets of confidence regions (combined into regions of uncertainty).  相似文献   

16.
The concept of degrees of freedom plays an important role in statistical modeling and is commonly used for measuring model complexity. The number of unknown parameters, which is typically used as the degrees of freedom in linear regression models, may fail to work in some modeling procedures, in particular for linear mixed effects models. In this article, we propose a new definition of generalized degrees of freedom in linear mixed effects models. It is derived from using the sum of the sensitivity of the expected fitted values with respect to their underlying true means. We explore and compare data perturbation and the residual bootstrap to empirically estimate model complexity. We also show that this empirical generalized degrees of freedom measure satisfies some desirable properties and is useful for the selection of linear mixed effects models.  相似文献   

17.
We propose a four-parameter extended generalized gamma model, which includes as special cases some important distributions and it is very useful for modeling lifetime data. A advantage is that it can represent the error distribution for a new heteroscedastic log-odd log-logistic generalized gamma regression model. The proposed heteroscedastic regression model can be used more effectively in the analysis of survival data since it includes as special models several widely-known regression models. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. Overall, the new regression model is very useful to the analysis of real data.  相似文献   

18.
Data from complex surveys are being used increasingly to build the same sort of explanatory and predictive models as those used in the rest of statistics. Unfortunately the assumptions underlying standard statistical methods are not even approximately valid for most survey data. The problem of parameter estimation has been largely solved, at least for routine data analysis, through the use of weighted estimating equations, and software for most standard analytical procedures is now available in the major statistical packages. One notable omission from standard software is an analogue of the likelihood ratio test. An exception is the Rao–Scott test for loglinear models in contingency tables. In this paper we show how the Rao–Scott test can be extended to handle arbitrary regression models. We illustrate the process of fitting a model to survey data with an example from NHANES.  相似文献   

19.
Model-based clustering for social networks   总被引:5,自引:0,他引:5  
Summary.  Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model , under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean 'social space', and the actors' locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.  相似文献   

20.
Stochastic kinetic models are often used to describe complex biological processes. Typically these models are analytically intractable and have unknown parameters which need to be estimated from observed data. Ideally we would have measurements on all interacting chemical species in the process, observed continuously in time. However, in practice, measurements are taken only at a relatively few time‐points. In some situations, only very limited observation of the process is available, for example settings in which experimenters can only observe noisy observations on the proportion of cells that are alive. This makes the inference task even more problematic. We consider a range of data‐poor scenarios and investigate the performance of various computationally intensive Bayesian algorithms in determining the posterior distribution using data on proportions from a simple birth‐death process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号