期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Small sample sensitivity analysis techniques for computer models.with an application to risk assessment

Ronald L. Iman W.J. Conover 《统计学通讯:理论与方法》2013,42(17):1749-1842

As modeling efforts expand to a broader spectrum of areas the amount of computer time required to exercise the corresponding computer codes has become quite costly (several hours for a single run is not uncommon). This costly process can be directly tied to the complexity of the modeling and to the large number of input variables (often numbering in the hundreds) Further, the complexity of the modeling (usually involving systems of differential equations) makes the relationships among the input variables not mathematically tractable. In this setting it is desired to perform sensitivity studies of the input-output relationships. Hence, a judicious selection procedure for the choic of values of input variables is required, Latin hypercube sampling has been shown to work well on this type of problem.

However, a variety of situations require that decisions and judgments be made in the face of uncertainty. The source of this uncertainty may be lack ul knowledge about probability distributions associated with input variables, or about different hypothesized future conditions, or may be present as a result of different strategies associated with a decision making process In this paper a generalization of Latin hypercube sampling is given that allows these areas to be investigated without making additional computer runs. In particular it is shown how weights associated with Latin hypercube input vectors may be rhangpd to reflect different probability distribution assumptions on key input variables and yet provide: an unbiased estimate of the cumulative distribution function of the output variable. This allows for different distribution assumptions on input variables to be studied without additional computer runs and without fitting a response surface. In addition these same weights can be used in a modified nonparametric Friedman test to compare treatments, Sample size requirements needed to apply the results of the work are also considered. The procedures presented in this paper are illustrated using a model associated with the risk assessment of geologic disposal of radioactive waste. 相似文献

2.

Estimating percentiles of uncertain computer code outputs

Jeremy Oakley 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(1):83-93

Summary. A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design. 相似文献

3.

A screening approach for non-parametric global sensitivity analysis

《Journal of Statistical Computation and Simulation》2012,82(4):656-675

Global sensitivity analysis (GSA) can help practitioners focusing on the inputs whose uncertainties have an impact on the model output, which allows reducing the complexity of the model. Screening, as the qualitative method of GSA, is to identify and exclude non- or less-influential input variables in high-dimensional models. However, for non-parametric problems, there remains the challenging problem of finding an efficient screening procedure, as one needs to properly handle the non-parametric high-order interactions among input variables and keep the size of the screening experiment economically feasible. In this study, we design a novel screening approach based on analysis of variance decomposition of the model. This approach combines the virtues of run-size economy and model independence. The core idea is to choose a low-level complete orthogonal array to derive the sensitivity estimates for all input factors and their interactions with low cost, and then develop a statistical process to screen out the non-influential ones without assuming the effect-sparsity of the model. Simulation studies show that the proposed approach performs well in various settings. 相似文献

4.

Quantifying uncertainty in the biospheric carbon flux for England and Wales 总被引：1，自引：0，他引：1

Marc Kennedy Clive Anderson Anthony O'Hagan Mark Lomas Ian Woodward John Paul Gosling Andreas Heinemeyer 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):109-135

Summary. A crucial issue in the current global warming debate is the effect of vegetation and soils on carbon dioxide (CO₂) concentrations in the atmosphere. Vegetation can extract CO₂ through photosynthesis, but respiration, decay of soil organic matter and disturbance effects such as fire return it to the atmosphere. The balance of these processes is the net carbon flux. To estimate the biospheric carbon flux for England and Wales, we address the statistical problem of inference for the sum of multiple outputs from a complex deterministic computer code whose input parameters are uncertain. The code is a process model which simulates the carbon dynamics of vegetation and soils, including the amount of carbon that is stored as a result of photosynthesis and the amount that is returned to the atmosphere through respiration. The aggregation of outputs corresponding to multiple sites and types of vegetation in a region gives an estimate of the total carbon flux for that region over a period of time. Expert prior opinions are elicited for marginal uncertainty about the relevant input parameters and for correlations of inputs between sites. A Gaussian process model is used to build emulators of the multiple code outputs and Bayesian uncertainty analysis is then used to propagate uncertainty in the input parameters through to uncertainty on the aggregated output. Numerical results are presented for England and Wales in the year 2000. It is estimated that vegetation and soils in England and Wales constituted a net sink of 7.55 Mt C (1 Mt C = 10¹² g of carbon) in 2000, with standard deviation 0.56 Mt C resulting from the sources of uncertainty that are considered. 相似文献

5.

Circuit optimization via sequential computer experiments: design of an output buffer

Robert Aslett Robert J. Buck Steven G. Duvall Jerome Sacks & William J. Welch 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(1):31-48

In electrical engineering, circuit designs are now often optimized via circuit simulation computer models. Typically, many response variables characterize the circuit's performance. Each response is a function of many input variables, including factors that can be set in the engineering design and noise factors representing manufacturing conditions. We describe a modelling approach which is appropriate for the simulator's deterministic input–output relationships. Non-linearities and interactions are identified without explicit assumptions about the functional form. These models lead to predictors to guide the reduction of the ranges of the designable factors in a sequence of experiments. Ultimately, the predictors are used to optimize the engineering design. We also show how a visualization of the fitted relationships facilitates an understanding of the engineering trade-offs between responses. The example used to demonstrate these methods, the design of a buffer circuit, has multiple targets for the responses, representing different trade-offs between the key performance measures. 相似文献

6.

ANOVA decomposition of conditional Gaussian processes for sensitivity analysis with dependent inputs

《Journal of Statistical Computation and Simulation》2012,82(11):2164-2186

Complex computer codes are widely used in science to model physical systems. Sensitivity analysis aims to measure the contributions of the inputs on the code output variability. An efficient tool to perform such analysis is the variance-based methods which have been recently investigated in the framework of dependent inputs. One of their issue is that they require a large number of runs for the complex simulators. To handle it, a Gaussian process (GP) regression model may be used to approximate the complex code. In this work, we propose to decompose a GP into a high-dimensional representation. This leads to the definition of a variance-based sensitivity measure well tailored for non-independent inputs. We give a methodology to estimate these indices and to quantify their uncertainty. Finally, the approach is illustrated on toy functions and on a river flood model. 相似文献

7.

Application of numerical interval analysis to obtain self-validating results for multivariate probabilities in a massively parallel environment

WANG OUHONG KENNEDY WILLIAM J. 《Statistics and Computing》1997,7(3):163-171

Conventional computations use real numbers as input and produce real numbers as results without any indication of the accuracy. Interval analysis, instead, uses interval elements throughout the computation and produces intervals as output with the guarantee that the true results are contained in them. One major use for interval analysis in statistics is to get results of high-dimensional multivariate probabilities. With the efforts to decrease the length of the intervals that contain the theoretically true answers, we can obtain results to any arbitrary accuracy, which is demonstrated by multivariate normal and multivariate t integrations. This is an advantage over the approximation methods that are currently in use. Since interval analysis is more computationally intensive than traditional computing, a MasPar parallel computer is used in this research to improve performance. 相似文献

8.

Branch and bound algorithms for maximizing expected improvement functions

Mark Franey Pritam Ranjan Hugh Chipman 《Journal of statistical planning and inference》2011,141(1):42-55

Deterministic computer simulations are often used as replacement for complex physical experiments. Although less expensive than physical experimentation, computer codes can still be time-consuming to run. An effective strategy for exploring the response surface of the deterministic simulator is the use of an approximation to the computer code, such as a Gaussian process (GP) model, coupled with a sequential sampling strategy for choosing design points that can be used to build the GP model. The ultimate goal of such studies is often the estimation of specific features of interest of the simulator output, such as the maximum, minimum, or a level set (contour). Before approximating such features with the GP model, sufficient runs of the computer simulator must be completed.Sequential designs with an expected improvement (EI) design criterion can yield good estimates of the features with minimal number of runs. The challenge is that the expected improvement function itself is often multimodal and difficult to maximize. We develop branch and bound algorithms for efficiently maximizing the EI function in specific problems, including the simultaneous estimation of a global maximum and minimum, and in the estimation of a contour. These branch and bound algorithms outperform other optimization strategies such as genetic algorithms, and can lead to significantly more accurate estimation of the features of interest. 相似文献

9.

Structured regularization for conditional Gaussian graphical models

Julien Chiquet Tristan Mary-Huard Stéphane Robin 《Statistics and Computing》2017,27(3):789-804

Conditional Gaussian graphical models are a reparametrization of the multivariate linear regression model which explicitly exhibits (i) the partial covariances between the predictors and the responses, and (ii) the partial covariances between the responses themselves. Such models are particularly suitable for interpretability since partial covariances describe direct relationships between variables. In this framework, we propose a regularization scheme to enhance the learning strategy of the model by driving the selection of the relevant input features by prior structural information. It comes with an efficient alternating optimization procedure which is guaranteed to converge to the global minimum. On top of showing competitive performance on artificial and real datasets, our method demonstrates capabilities for fine interpretation, as illustrated on three high-dimensional datasets from spectroscopy, genetics, and genomics. 相似文献

10.

Design and analysis of computer experiments when the output is highly correlated over the input space

Yong B. Lim Jerome Sacks W. J. Studden William J. Welch 《Revue canadienne de statistique》2002,30(1):109-126

To build a predictor, the output of a deterministic computer model or “code” is often treated as a realization of a stochastic process indexed by the code's input variables. The authors consider an asymptotic form of the Gaussian correlation function for the stochastic process where the correlation tends to unity. They show that the limiting best linear unbiased predictor involves Lagrange interpolating polynomials; linear model terms are implicitly included. The authors then develop optimal designs based on minimizing the limiting integrated mean squared error of prediction. They show through several examples that these designs lead to good prediction accuracy. 相似文献

11.

Global sensitivity analysis of stochastic computer models with joint metamodels 总被引：2，自引：0，他引：2

Amandine Marrel Bertrand Iooss Sébastien Da Veiga Mathieu Ribatet 《Statistics and Computing》2012,22(3):833-847

The global sensitivity analysis method used to quantify the influence of uncertain input variables on the variability in numerical model responses has already been applied to deterministic computer codes; deterministic means here that the same set of input variables always gives the same output value. This paper proposes a global sensitivity analysis methodology for stochastic computer codes, for which the result of each code run is itself random. The framework of the joint modeling of the mean and dispersion of heteroscedastic data is used. To deal with the complexity of computer experiment outputs, nonparametric joint models are discussed and a new Gaussian process-based joint model is proposed. The relevance of these models is analyzed based upon two case studies. Results show that the joint modeling approach yields accurate sensitivity index estimators even when heteroscedasticity is strong. 相似文献

12.

Energy performance evaluation of OECD countries using Bayesian stochastic frontier analysis and Bayesian network classifiers

Mehmet Ali Cengiz Talat Şenel 《Journal of applied statistics》2018,45(1):17-25

More recently a large amount of interest has been devoted to the use of Bayesian methods for deriving parameter estimates of the stochastic frontier analysis. Bayesian stochastic frontier analysis (BSFA) seems to be a useful method to assess the efficiency in energy sector. However, BSFA results do not expose the multiple relationships between input and output variables and energy efficiency. This study proposes a framework to make inferences about BSFA efficiencies, recognizing the underlying relationships between variables and efficiency, using Bayesian network (BN) approach. BN classifiers are proposed as a method to analyze the results obtained from BSFA. 相似文献

13.

Genetic programming as a means for programming computers by natural selection 总被引：4，自引：0，他引：4

John R. Koza 《Statistics and Computing》1994,4(2):87-112

Many seemingly different problems in machine learning, artificial intelligence, and symbolic processing can be viewed as requiring the discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a highly fit individual computer program. The recently developed genetic programming paradigm described herein provides a way to search the space of possible computer programs for a highly fit individual computer program to solve (or approximately solve) a surprising variety of different problems from different fields. In genetic programming, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (sexual recombination) operator appropriate for genetically mating computer programs. Genetic programming is illustrated via an example of machine learning of the Boolean 11-multiplexer function and symbolic regression of the econometric exchange equation from noisy empirical data.Hierarchical automatic function definition enables genetic programming to define potentially useful functions automatically and dynamically during a run, much as a human programmer writing a complex computer program creates subroutines (procedures, functions) to perform groups of steps which must be performed with different instantiations of the dummy variables (formal parameters) in more than one place in the main program. Hierarchical automatic function definition is illustrated via the machine learning of the Boolean 11-parity function. 相似文献

14.

Sequential tuning of complex computer models

《Journal of Statistical Computation and Simulation》2012,82(2):393-404

We propose a method that uses a sequential design instead of a space filling design for estimating tuning parameters of a complex computer model. The goal is to bring the computer model output closer to the real system output. The method fits separate Gaussian process (GP) models to the available data from the physical experiment and the computer experiment and minimizes the discrepancy between the predictions from the GP models to obtain estimates of the tuning parameters. A criterion based on the discrepancy between the predictions from the two GP models and the standard error of prediction for the computer experiment output is then used to obtain a design point for the next run of the computer experiment. The tuning parameters are re-estimated using the augmented data set. The steps are repeated until the budget for the computer experiment data is exhausted. Simulation studies show that the proposed method performs better in bringing a computer model closer to the real system than methods that use a space filling design. 相似文献

15.

中国公共服务投入与产出的动态关系研究——以教育和公共卫生为例

王宝顺刘京焕《统计与信息论坛》2011,(11):32-38

在财政实践中,公共服务投入与产出之间往往存在着动态的双向效应,而以往学者仅着重于单向效应的研究,这无助于在公共预算约束下实现公共服务供给效率最大化。利用向量自回归模型和向量误差修正模型考察了中国以教育和公共医疗卫生为代表的公共服务投入与产出长期均衡关系以及短期波动关系。研究结果表明：公共服务投入与产出在长期内存在协整关系;短期内公共服务的投入与产出之间存在动态正向关系;在受到外部随机冲击时,公共服务的投入与产出系统表现出良好的稳定性和规律性。相似文献

16.

Sampling plans based on balanced incomplete block designs for evaluating the importance of computer model inputs

《Journal of statistical planning and inference》2006,136(9):3203-3220

The importance of individual inputs of a computer model is sometimes assessed using indices that reflect the amount of output variation that can be attributed to random variation in each input. We review two such indices, and consider input sampling plans that support estimation of one of them, the variance of conditional expectation or VCE (Mckay, 1995. Los Alamos National Laboratory Report NUREG/CR-6311, LA-12915-MS). Sampling plans suggested by Sobol’, Saltelli, and McKay, are examined and compared to a new sampling plan based on balanced incomplete block designs. The new design offers better sampling efficiency for the VCE than those of Sobol’ and Saltelli, and supports unbiased estimation of the index associated with each input. 相似文献

17.

Bayesian inference for zero-and-one-inflated geometric distribution regression model using Pólya-Gamma latent variables

Xiang Xiao Ancha Xu Guoqiang Wang 《统计学通讯:理论与方法》2020,49(15):3730-3743

Abstract

In the fields of internet financial transactions and reliability engineering, there could be more zero and one observations simultaneously. In this paper, considering that it is beyond the range where the conventional model can fit, zero-and-one-inflated geometric distribution regression model is proposed. Ingeniously introducing Pólya-Gamma latent variables in the Bayesian inference, posterior sampling with high-dimensional parameters is converted to latent variables sampling and posterior sampling with lower-dimensional parameters, respectively. Circumventing the need for Metropolis-Hastings sampling, the sample with higher sampling efficiency is obtained. A simulation study is conducted to assess the performance of the proposed estimation for various sample sizes. Finally, a doctoral dissertation data set is analyzed to illustrate the practicability of the proposed method, research shows that zero-and-one-inflated geometric distribution regression model using Pólya-Gamma latent variables can achieve better fitting results. 相似文献

18.

Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes

《Statistical Methodology》2014

As ecological data sets increase in spatial and temporal extent with the advent of new remote sensing platforms and long-term monitoring networks, there is increasing interest in forecasting ecological processes. Such forecasts require realistic initial conditions over complete spatial domains. Typically, data sources are incomplete in space, and the processes include complicated dynamical interactions across physical and biological variables. This suggests that data assimilation, whereby observations are fused with mechanistic models, is the most appropriate means of generating complete initial conditions. Often, the mechanistic models used for these procedures are very expensive computationally. We demonstrate a rank-reduced approach for ecological data assimilation whereby the mechanistic model is based on a statistical emulator. Critically, the rank-reduction and emulator construction are linked and, by utilizing a hierarchical framework, uncertainty associated with the dynamical emulator can be accounted for. This provides a so-called “weak-constraint” data assimilation procedure. This approach is demonstrated on a high-dimensional multivariate coupled biogeochemical ocean process. 相似文献

19.

Climate,agriculture, and hunger: statistical prediction of undernourishment using nonlinear regression and data-mining techniques

Julie E. Shortridge Stefanie M. Falconi Benjamin F. Zaitchik Seth D. Guikema 《Journal of applied statistics》2015,42(11):2367-2390

An estimated 1 billion people suffer from hunger worldwide, and climate change, urbanization, and globalization have the potential to exacerbate this situation. Improved models for predicting food security are needed to understand these impacts and design interventions. However, food insecurity is the result of complex interactions between physical and socio-economic factors that can overwhelm linear regression models. More sophisticated data-mining approaches could provide an effective way to model these relationships and accurately predict food insecure situations. In this paper, we compare multiple regression and data-mining methods in their ability to predict the percent of a country's population that suffers from undernourishment using widely available predictor variables related to socio-economic settings, agricultural production and trade, and climate conditions. Averaging predictions from multiple models results in the lowest predictive error and provides an accurate method to predict undernourishment levels. Partial dependence plots are used to evaluate covariate influence and demonstrate the relationship between food insecurity and climatic and socio-economic variables. By providing insights into these relationships and a mechanism for predicting undernourishment using readily available data, statistical models like those developed here could be a useful tool for those tasked with understanding and addressing food insecurity. 相似文献

20.

The FastHCS algorithm for robust PCA

Eric Schmitt Kaveh Vakili 《Statistics and Computing》2016,26(6):1229-1242

Principal component analysis (PCA) is widely used to analyze high-dimensional data, but it is very sensitive to outliers. Robust PCA methods seek fits that are unaffected by the outliers and can therefore be trusted to reveal them. FastHCS (high-dimensional congruent subsets) is a robust PCA algorithm suitable for high-dimensional applications, including cases where the number of variables exceeds the number of observations. After detailing the FastHCS algorithm, we carry out an extensive simulation study and three real data applications, the results of which show that FastHCS is systematically more robust to outliers than state-of-the-art methods. 相似文献