期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

ANOVA decomposition of conditional Gaussian processes for sensitivity analysis with dependent inputs

《Journal of Statistical Computation and Simulation》2012,82(11):2164-2186

Complex computer codes are widely used in science to model physical systems. Sensitivity analysis aims to measure the contributions of the inputs on the code output variability. An efficient tool to perform such analysis is the variance-based methods which have been recently investigated in the framework of dependent inputs. One of their issue is that they require a large number of runs for the complex simulators. To handle it, a Gaussian process (GP) regression model may be used to approximate the complex code. In this work, we propose to decompose a GP into a high-dimensional representation. This leads to the definition of a variance-based sensitivity measure well tailored for non-independent inputs. We give a methodology to estimate these indices and to quantify their uncertainty. Finally, the approach is illustrated on toy functions and on a river flood model. 相似文献

2.

Probabilistic sensitivity analysis of complex models: a Bayesian approach 总被引：3，自引：0，他引：3

Jeremy E. Oakley Anthony O'Hagan 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2004,66(3):751-769

Summary. In many areas of science and technology, mathematical models are built to simulate complex real world phenomena. Such models are typically implemented in large computer programs and are also very complex, such that the way that the model responds to changes in its inputs is not transparent. Sensitivity analysis is concerned with understanding how changes in the model inputs influence the outputs. This may be motivated simply by a wish to understand the implications of a complex model but often arises because there is uncertainty about the true values of the inputs that should be used for a particular application. A broad range of measures have been advocated in the literature to quantify and describe the sensitivity of a model's output to variation in its inputs. In practice the most commonly used measures are those that are based on formulating uncertainty in the model inputs by a joint probability distribution and then analysing the induced uncertainty in outputs, an approach which is known as probabilistic sensitivity analysis. We present a Bayesian framework which unifies the various tools of prob- abilistic sensitivity analysis. The Bayesian approach is computationally highly efficient. It allows effective sensitivity analysis to be achieved by using far smaller numbers of model runs than standard Monte Carlo methods. Furthermore, all measures of interest may be computed from a single set of runs. 相似文献

3.

Gaussian process emulation for second-order Monte Carlo simulations

J.S. Johnson J.P. Gosling M.C. Kennedy 《Journal of statistical planning and inference》2011,141(5):1838-1848

We consider the use of emulator technology as an alternative method to second-order Monte Carlo (2DMC) in the uncertainty analysis for a percentile from the output of a stochastic model. 2DMC is a technique that uses repeated sampling in order to make inferences on the uncertainty and variability in a model output. The conventional 2DMC approach can often be highly computational, making methods for uncertainty and sensitivity analysis unfeasible. We explore the adequacy and efficiency of the emulation approach, and we find that emulation provides a viable alternative in this situation. We demonstrate these methods using two different examples of different input dimensions, including an application that considers contamination in pre-pasteurised milk. 相似文献

4.

Quantifying uncertainty in the biospheric carbon flux for England and Wales 总被引：1，自引：0，他引：1

Marc Kennedy Clive Anderson Anthony O'Hagan Mark Lomas Ian Woodward John Paul Gosling Andreas Heinemeyer 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):109-135

Summary. A crucial issue in the current global warming debate is the effect of vegetation and soils on carbon dioxide (CO₂) concentrations in the atmosphere. Vegetation can extract CO₂ through photosynthesis, but respiration, decay of soil organic matter and disturbance effects such as fire return it to the atmosphere. The balance of these processes is the net carbon flux. To estimate the biospheric carbon flux for England and Wales, we address the statistical problem of inference for the sum of multiple outputs from a complex deterministic computer code whose input parameters are uncertain. The code is a process model which simulates the carbon dynamics of vegetation and soils, including the amount of carbon that is stored as a result of photosynthesis and the amount that is returned to the atmosphere through respiration. The aggregation of outputs corresponding to multiple sites and types of vegetation in a region gives an estimate of the total carbon flux for that region over a period of time. Expert prior opinions are elicited for marginal uncertainty about the relevant input parameters and for correlations of inputs between sites. A Gaussian process model is used to build emulators of the multiple code outputs and Bayesian uncertainty analysis is then used to propagate uncertainty in the input parameters through to uncertainty on the aggregated output. Numerical results are presented for England and Wales in the year 2000. It is estimated that vegetation and soils in England and Wales constituted a net sink of 7.55 Mt C (1 Mt C = 10¹² g of carbon) in 2000, with standard deviation 0.56 Mt C resulting from the sources of uncertainty that are considered. 相似文献

5.

Bayesian calibration of computer models 总被引：5，自引：0，他引：5

Marc C. Kennedy & Anthony O'Hagan 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(3):425-464

We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise. 相似文献

6.

Emulation and interpretation of high-dimensional climate model outputs

Philip B. Holden Neil R. Edwards Paul H. Garthwaite Richard D. Wilkinson 《Journal of applied statistics》2015,42(9):2038-2055

Running complex computer models can be expensive in computer time, while learning about the relationships between input and output variables can be difficult. An emulator is a fast approximation to a computationally expensive model that can be used as a surrogate for the model, to quantify uncertainty or to improve process understanding. Here, we examine emulators based on singular value decompositions (SVDs) and use them to emulate global climate and vegetation fields, examining how these fields are affected by changes in the Earth's orbit. The vegetation field may be emulated directly from the orbital variables, but an appealing alternative is to relate it to emulations of the climate fields, which involves high-dimensional input and output. The SVDs radically reduce the dimensionality of the input and output spaces and are shown to clarify the relationships between them. The method could potentially be useful for any complex process with correlated, high-dimensional inputs and/or outputs. 相似文献

7.

Data-driven stochastic inversion via functional quantization

Mohamed Reda El Amri C&#;line Helbert Olivier Lepreux Miguel Munoz Zuniga Cl&#;mentine Prieur Delphine Sinoquet 《Statistics and Computing》2020,30(3):525-541

In this paper, we propose a new methodology for solving stochastic inversion problems through computer experiments, the stochasticity being driven by a functional random variables. This study is motivated by an automotive application. In this context, the simulator code takes a double set of simulation inputs: deterministic control variables and functional uncertain variables. This framework is characterized by two features. The first one is the high computational cost of simulations. The second is that the probability distribution of the functional input is only known through a finite set of realizations. In our context, the inversion problem is formulated by considering the expectation over the functional random variable. We aim at solving this problem by evaluating the model on a design, whose adaptive construction combines the so-called stepwise uncertainty reduction methodology with a strategy for an efficient expectation estimation. Two greedy strategies are introduced to sequentially estimate the expectation over the functional uncertain variable by adaptively selecting curves from the initial set of realizations. Both of these strategies consider functional principal component analysis as a dimensionality reduction technique assuming that the realizations of the functional input are independent realizations of the same continuous stochastic process. The first strategy is based on a greedy approach for functional data-driven quantization, while the second one is linked to the notion of space-filling design. Functional PCA is used as an intermediate step. For each point of the design built in the reduced space, we select the corresponding curve from the sample of available curves, thus guaranteeing the robustness of the procedure to dimension reduction. The whole methodology is illustrated and calibrated on an analytical example. It is then applied on the automotive industrial test case where we aim at identifying the set of control parameters leading to meet the pollutant emission standards of a vehicle. 相似文献

8.

Uncertainty Analysis for Functional Computer Output: A Simulation Study in Inverse Problems

Dorin Drignei 《统计学通讯:模拟与计算》2016,45(9):3281-3293

Computer models with functional output are omnipresent throughout science and engineering. Most often the computer model is treated as a black-box and information about the underlying mathematical model is not exploited in statistical analyses. Consequently, general-purpose bases such as wavelets are typically used to describe the main characteristics of the functional output. In this article we advocate for using information about the underlying mathematical model in order to choose a better basis for the functional output. To validate this choice, a simulation study is presented in the context of uncertainty analysis for a computer model from inverse Sturm-Liouville problems. 相似文献

9.

Branch and bound algorithms for maximizing expected improvement functions

Mark Franey Pritam Ranjan Hugh Chipman 《Journal of statistical planning and inference》2011,141(1):42-55

Deterministic computer simulations are often used as replacement for complex physical experiments. Although less expensive than physical experimentation, computer codes can still be time-consuming to run. An effective strategy for exploring the response surface of the deterministic simulator is the use of an approximation to the computer code, such as a Gaussian process (GP) model, coupled with a sequential sampling strategy for choosing design points that can be used to build the GP model. The ultimate goal of such studies is often the estimation of specific features of interest of the simulator output, such as the maximum, minimum, or a level set (contour). Before approximating such features with the GP model, sufficient runs of the computer simulator must be completed.Sequential designs with an expected improvement (EI) design criterion can yield good estimates of the features with minimal number of runs. The challenge is that the expected improvement function itself is often multimodal and difficult to maximize. We develop branch and bound algorithms for efficiently maximizing the EI function in specific problems, including the simultaneous estimation of a global maximum and minimum, and in the estimation of a contour. These branch and bound algorithms outperform other optimization strategies such as genetic algorithms, and can lead to significantly more accurate estimation of the features of interest. 相似文献

10.

Rotation designs for experiments in high-bias situations

《Journal of statistical planning and inference》2001,97(2):399-414

Many experiments in the physical and engineering sciences study complex processes in which bias due to model inadequacy dominates random error. A noteworthy example of this situation is the use of computer experiments, in which scientists simulate the phenomenon being studied by a computer code. Computer experiments are deterministic: replicate observations from running the code with the same inputs will be identical. Such high-bias settings demand different techniques for design and prediction. This paper will focus on the experimental design problem introducing a new class of designs called rotation designs. Rotation designs are found by taking an orthogonal starting design D and rotating it to obtain a new design matrix D_R=DR, where R is any orthonormal matrix. The new design is still orthogonal for a first-order model. In this paper, we study some of the properties of rotation designs and we present a method to generate rotation designs that have some appealing symmetry properties. 相似文献

11.

Global sensitivity analysis of stochastic computer models with joint metamodels 总被引：2，自引：0，他引：2

Amandine Marrel Bertrand Iooss Sébastien Da Veiga Mathieu Ribatet 《Statistics and Computing》2012,22(3):833-847

The global sensitivity analysis method used to quantify the influence of uncertain input variables on the variability in numerical model responses has already been applied to deterministic computer codes; deterministic means here that the same set of input variables always gives the same output value. This paper proposes a global sensitivity analysis methodology for stochastic computer codes, for which the result of each code run is itself random. The framework of the joint modeling of the mean and dispersion of heteroscedastic data is used. To deal with the complexity of computer experiment outputs, nonparametric joint models are discussed and a new Gaussian process-based joint model is proposed. The relevance of these models is analyzed based upon two case studies. Results show that the joint modeling approach yields accurate sensitivity index estimators even when heteroscedasticity is strong. 相似文献

12.

Projection-based curve clustering

《Journal of Statistical Computation and Simulation》2012,82(8):1145-1168

This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat à l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the ‘true’ ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. 相似文献

13.

Data-driven Kriging models based on FANOVA-decomposition

Thomas Muehlenstaedt Olivier Roustant Laurent Carraro Sonja Kuhnt 《Statistics and Computing》2012,22(3):723-738

Kriging models have been widely used in computer experiments for the analysis of time-consuming computer codes. Based on kernels, they are flexible and can be tuned to many situations. In this paper, we construct kernels that reproduce the computer code complexity by mimicking its interaction structure. While the standard tensor-product kernel implicitly assumes that all interactions are active, the new kernels are suited for a general interaction structure, and will take advantage of the absence of interaction between some inputs. The methodology is twofold. First, the interaction structure is estimated from the data, using a first initial standard Kriging model, and represented by a so-called FANOVA graph. New FANOVA-based sensitivity indices are introduced to detect active interactions. Then this graph is used to derive the form of the kernel, and the corresponding Kriging model is estimated by maximum likelihood. The performance of the overall procedure is illustrated by several 3-dimensional and 6-dimensional simulated and real examples. A substantial improvement is observed when the computer code has a relatively high level of complexity. 相似文献

14.

Data free inference with processed data products

K. Chowdhary H. N. Najm 《Statistics and Computing》2016,26(1-2):149-169

We consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics. 相似文献

15.

Sampling plans based on balanced incomplete block designs for evaluating the importance of computer model inputs

《Journal of statistical planning and inference》2006,136(9):3203-3220

The importance of individual inputs of a computer model is sometimes assessed using indices that reflect the amount of output variation that can be attributed to random variation in each input. We review two such indices, and consider input sampling plans that support estimation of one of them, the variance of conditional expectation or VCE (Mckay, 1995. Los Alamos National Laboratory Report NUREG/CR-6311, LA-12915-MS). Sampling plans suggested by Sobol’, Saltelli, and McKay, are examined and compared to a new sampling plan based on balanced incomplete block designs. The new design offers better sampling efficiency for the VCE than those of Sobol’ and Saltelli, and supports unbiased estimation of the index associated with each input. 相似文献

16.

Real-Time Prediction With U.K. Monetary Aggregates in the Presence of Model Uncertainty

《商业与经济统计学杂志》2013,31(4):480-491

A popular account for the demise of the U.K.’s monetary targeting regime in the 1980s blames the fluctuating predictive relationships between broad money and inflation and real output growth. Yet ex post policy analysis based on heavily revised data suggests no fluctuations in the predictive content of money. In this paper, we investigate the predictive relationships for inflation and output growth using both real-time and heavily revised data. We consider a large set of recursively estimated vector autoregressive (VAR) and vector error correction models (VECM). These models differ in terms of lag length and the number of cointegrating relationships. We use Bayesian model averaging (BMA) to demonstrate that real-time monetary policymakers faced considerable model uncertainty. The in-sample predictive content of money fluctuated during the 1980s as a result of data revisions in the presence of model uncertainty. This feature is only apparent with real-time data as heavily revised data obscure these fluctuations. Out-of-sample predictive evaluations rarely suggest that money matters for either inflation or real output. We conclude that both data revisions and model uncertainty contributed to the demise of the U.K.’s monetary targeting regime. 相似文献

17.

Sequential tuning of complex computer models

《Journal of Statistical Computation and Simulation》2012,82(2):393-404

We propose a method that uses a sequential design instead of a space filling design for estimating tuning parameters of a complex computer model. The goal is to bring the computer model output closer to the real system output. The method fits separate Gaussian process (GP) models to the available data from the physical experiment and the computer experiment and minimizes the discrepancy between the predictions from the GP models to obtain estimates of the tuning parameters. A criterion based on the discrepancy between the predictions from the two GP models and the standard error of prediction for the computer experiment output is then used to obtain a design point for the next run of the computer experiment. The tuning parameters are re-estimated using the augmented data set. The steps are repeated until the budget for the computer experiment data is exhausted. Simulation studies show that the proposed method performs better in bringing a computer model closer to the real system than methods that use a space filling design. 相似文献

18.

Batch sequential designs for computer experiments

Jason L. Loeppky Leslie M. Moore Brian J. Williams 《Journal of statistical planning and inference》2010

Computer models simulating a physical process are used in many areas of science. Due to the complex nature of these codes it is often necessary to approximate the code, which is typically done using a Gaussian process. In many situations the number of code runs available to build the Gaussian process approximation is limited. When the initial design is small or the underlying response surface is complicated this can lead to poor approximations of the code output. In order to improve the fit of the model, sequential design strategies must be employed. In this paper we introduce two simple distance based metrics that can be used to augment an initial design in a batch sequential manner. In addition we propose a sequential updating strategy to an orthogonal array based Latin hypercube sample. We show via various real and simulated examples that the distance metrics and the extension of the orthogonal array based Latin hypercubes work well in practice. 相似文献

19.

Sensitivity,uncertainty, and decision analyses in the prioritization of research

《Journal of Statistical Computation and Simulation》2012,82(1-4):175-196

The evaluation of hazards from complex, large scale, technologically advanced systems often requires the construction of computer implemented mathematical models. These models are used to evaluate the safety of the systems and to evaluate the consequences of modifications to the systems. These evaluations, however, are normally surrounded by significant uncertainties related to the uncertainty inherent in natural phenomena such as the weather and those related to uncertainties in the parameters and models used in the evaluation.

Another use of these models is to evaluate strategies for improving information used in the modeling process itself. While sensitivity analysis is useful in defining variables in the model that are important, uncertainty analysis provides a tool for assessing the importance of uncertainty about these variables. A third complementary technique, is decision analysis. It provides a methodology for explicitly evaluating and ranking potential improvements to the model. Its use in the development of information gathering strategies for a nuclear waste repository are discussed in this paper. 相似文献

20.

Statistical inference for Sobol pick-freeze Monte Carlo method

F. Gamboa A. Janon T. Klein C. Prieur 《Statistics》2016,50(4):881-902

Many mathematical models involve input parameters, which are not precisely known. Global sensitivity analysis aims to identify the parameters whose uncertainty has the largest impact on the variability of a quantity of interest (output of the model). One of the statistical tools used to quantify the influence of each input variable on the output is the Sobol sensitivity index. We consider the statistical estimation of this index from a finite sample of model outputs. We study asymptotic and non-asymptotic properties of two estimators of Sobol indices. These properties are applied to significance tests and estimation by confidence intervals. 相似文献