首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

Physical phenomena are commonly modelled by time consuming numerical simulators, function of many uncertain parameters whose influences can be measured via a global sensitivity analysis. The usual variance-based indices require too many simulations, especially as the inputs are numerous. To address this limitation, we consider recent advances in dependence measures, focusing on the distance correlation and the Hilbert–Schmidt independence criterion. We study and use these indices for a screening purpose. Numerical tests reveal differences between variance-based indices and dependence measures. Then, two approaches are proposed to use the latter for a screening purpose. The first approach uses independence tests, with existing asymptotic versions and spectral extensions; bootstrap versions are also proposed. The second considers a linear model with dependence measures, coupled to a bootstrap selection method or a Lasso penalization. Numerical experiments show their potential in the presence of many non-influential inputs and give successful results for a nuclear reliability application.  相似文献   

2.
Sensitivity analysis is an essential tool in the development of robust models for engineering, physical sciences, economics and policy-making, but typically requires running the model a large number of times in order to estimate sensitivity measures. While statistical emulators allow sensitivity analysis even on complex models, they only perform well with a moderately low number of model inputs: in higher dimensional problems they tend to require a restrictively high number of model runs unless the model is relatively linear. Therefore, an open question is how to tackle sensitivity problems in higher dimensionalities, at very low sample sizes. This article examines the relative performance of four sampling-based measures which can be used in such high-dimensional nonlinear problems. The measures tested are the Sobol' total sensitivity indices, the absolute mean of elementary effects, a derivative-based global sensitivity measure, and a modified derivative-based measure. Performance is assessed in a ‘screening’ context, by assessing the ability of each measure to identify influential and non-influential inputs on a wide variety of test functions at different dimensionalities. The results show that the best-performing measure in the screening context is dependent on the model or function, but derivative-based measures have a significant potential at low sample sizes that is currently not widely recognised.  相似文献   

3.
Probabilistic sensitivity analysis of complex models: a Bayesian approach   总被引:3,自引:0,他引:3  
Summary.  In many areas of science and technology, mathematical models are built to simulate complex real world phenomena. Such models are typically implemented in large computer programs and are also very complex, such that the way that the model responds to changes in its inputs is not transparent. Sensitivity analysis is concerned with understanding how changes in the model inputs influence the outputs. This may be motivated simply by a wish to understand the implications of a complex model but often arises because there is uncertainty about the true values of the inputs that should be used for a particular application. A broad range of measures have been advocated in the literature to quantify and describe the sensitivity of a model's output to variation in its inputs. In practice the most commonly used measures are those that are based on formulating uncertainty in the model inputs by a joint probability distribution and then analysing the induced uncertainty in outputs, an approach which is known as probabilistic sensitivity analysis. We present a Bayesian framework which unifies the various tools of prob- abilistic sensitivity analysis. The Bayesian approach is computationally highly efficient. It allows effective sensitivity analysis to be achieved by using far smaller numbers of model runs than standard Monte Carlo methods. Furthermore, all measures of interest may be computed from a single set of runs.  相似文献   

4.
Summary.  A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design.  相似文献   

5.
ABSTRACT

To reduce the output variance, the variance-based importance analysis can provide an efficient way by reducing the variance of the ‘important’ inputs. But with the reduction of the variance of those ‘important’ inputs, the input importance will change and it is no longer the most efficient way to reduce the variance of those ‘important’ inputs alone. Thus, analyst needs to consider reducing the variance of other inputs to obtain a more efficient way. This work provides a graphical solution for analyst to decide how to reduce the input variance to achieve the targeted reduction of the output variance efficiently. Furthermore, by the importance sampling-based approach, the graphical solution can be obtained with only a single group of samples, which can decrease the computational cost greatly.  相似文献   

6.
Deterministic computer simulations are often used as replacement for complex physical experiments. Although less expensive than physical experimentation, computer codes can still be time-consuming to run. An effective strategy for exploring the response surface of the deterministic simulator is the use of an approximation to the computer code, such as a Gaussian process (GP) model, coupled with a sequential sampling strategy for choosing design points that can be used to build the GP model. The ultimate goal of such studies is often the estimation of specific features of interest of the simulator output, such as the maximum, minimum, or a level set (contour). Before approximating such features with the GP model, sufficient runs of the computer simulator must be completed.Sequential designs with an expected improvement (EI) design criterion can yield good estimates of the features with minimal number of runs. The challenge is that the expected improvement function itself is often multimodal and difficult to maximize. We develop branch and bound algorithms for efficiently maximizing the EI function in specific problems, including the simultaneous estimation of a global maximum and minimum, and in the estimation of a contour. These branch and bound algorithms outperform other optimization strategies such as genetic algorithms, and can lead to significantly more accurate estimation of the features of interest.  相似文献   

7.
Massive correlated data with many inputs are often generated from computer experiments to study complex systems. The Gaussian process (GP) model is a widely used tool for the analysis of computer experiments. Although GPs provide a simple and effective approximation to computer experiments, two critical issues remain unresolved. One is the computational issue in GP estimation and prediction where intensive manipulations of a large correlation matrix are required. For a large sample size and with a large number of variables, this task is often unstable or infeasible. The other issue is how to improve the naive plug-in predictive distribution which is known to underestimate the uncertainty. In this article, we introduce a unified framework that can tackle both issues simultaneously. It consists of a sequential split-and-conquer procedure, an information combining technique using confidence distributions (CD), and a frequentist predictive distribution based on the combined CD. It is shown that the proposed method maintains the same asymptotic efficiency as the conventional likelihood inference under mild conditions, but dramatically reduces the computation in both estimation and prediction. The predictive distribution contains comprehensive information for inference and provides a better quantification of predictive uncertainty as compared with the plug-in approach. Simulations are conducted to compare the estimation and prediction accuracy with some existing methods, and the computational advantage of the proposed method is also illustrated. The proposed method is demonstrated by a real data example based on tens of thousands of computer experiments generated from a computational fluid dynamic simulator.  相似文献   

8.
Importance measures are used to estimate the relative importance of components to system reliability. Phased mission systems (PMS) have many components working in several phases with different success criteria, and their component structural importance is distinct in different phases. Additionally, reliability parameters of components in PMS always have uncertainty in practice. Therefore, existing component importance measures based on either the partial derivative of system structure function or component structural importance may have difficulties in PMS importance analysis. This paper presents a simulation method to evaluate the component global importance for PMS based on the variance-based method and the Monte-Carlo method. To facilitate the practical use, we further discuss the correlation relationship between the component global importance and its possible influence factors, and present here a fitting model for evaluating component global importance. Finally, two examples are given to show that the fitting model displays quite reasonable component importance.  相似文献   

9.
This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat à l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the ‘true’ ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem.  相似文献   

10.
Quantifying uncertainty in the biospheric carbon flux for England and Wales   总被引:1,自引:0,他引:1  
Summary.  A crucial issue in the current global warming debate is the effect of vegetation and soils on carbon dioxide (CO2) concentrations in the atmosphere. Vegetation can extract CO2 through photosynthesis, but respiration, decay of soil organic matter and disturbance effects such as fire return it to the atmosphere. The balance of these processes is the net carbon flux. To estimate the biospheric carbon flux for England and Wales, we address the statistical problem of inference for the sum of multiple outputs from a complex deterministic computer code whose input parameters are uncertain. The code is a process model which simulates the carbon dynamics of vegetation and soils, including the amount of carbon that is stored as a result of photosynthesis and the amount that is returned to the atmosphere through respiration. The aggregation of outputs corresponding to multiple sites and types of vegetation in a region gives an estimate of the total carbon flux for that region over a period of time. Expert prior opinions are elicited for marginal uncertainty about the relevant input parameters and for correlations of inputs between sites. A Gaussian process model is used to build emulators of the multiple code outputs and Bayesian uncertainty analysis is then used to propagate uncertainty in the input parameters through to uncertainty on the aggregated output. Numerical results are presented for England and Wales in the year 2000. It is estimated that vegetation and soils in England and Wales constituted a net sink of 7.55 Mt C (1 Mt C = 1012 g of carbon) in 2000, with standard deviation 0.56 Mt C resulting from the sources of uncertainty that are considered.  相似文献   

11.
Geometric process (GP) is widely used as a non-stationary stochastic model in reliability analysis. In many of applications related with GP its mean value and variance functions are needed. Since there are no analytical forms of these functions in a lot of situations their computations are of importance. In this study, a numerical approximation and Monte Carlo estimation method based on the convolutions of distribution functions have been proposed for both the mean value and variance functions.  相似文献   

12.
Kriging models have been widely used in computer experiments for the analysis of time-consuming computer codes. Based on kernels, they are flexible and can be tuned to many situations. In this paper, we construct kernels that reproduce the computer code complexity by mimicking its interaction structure. While the standard tensor-product kernel implicitly assumes that all interactions are active, the new kernels are suited for a general interaction structure, and will take advantage of the absence of interaction between some inputs. The methodology is twofold. First, the interaction structure is estimated from the data, using a first initial standard Kriging model, and represented by a so-called FANOVA graph. New FANOVA-based sensitivity indices are introduced to detect active interactions. Then this graph is used to derive the form of the kernel, and the corresponding Kriging model is estimated by maximum likelihood. The performance of the overall procedure is illustrated by several 3-dimensional and 6-dimensional simulated and real examples. A substantial improvement is observed when the computer code has a relatively high level of complexity.  相似文献   

13.
We propose a method that uses a sequential design instead of a space filling design for estimating tuning parameters of a complex computer model. The goal is to bring the computer model output closer to the real system output. The method fits separate Gaussian process (GP) models to the available data from the physical experiment and the computer experiment and minimizes the discrepancy between the predictions from the GP models to obtain estimates of the tuning parameters. A criterion based on the discrepancy between the predictions from the two GP models and the standard error of prediction for the computer experiment output is then used to obtain a design point for the next run of the computer experiment. The tuning parameters are re-estimated using the augmented data set. The steps are repeated until the budget for the computer experiment data is exhausted. Simulation studies show that the proposed method performs better in bringing a computer model closer to the real system than methods that use a space filling design.  相似文献   

14.
Many experiments in the physical and engineering sciences study complex processes in which bias due to model inadequacy dominates random error. A noteworthy example of this situation is the use of computer experiments, in which scientists simulate the phenomenon being studied by a computer code. Computer experiments are deterministic: replicate observations from running the code with the same inputs will be identical. Such high-bias settings demand different techniques for design and prediction. This paper will focus on the experimental design problem introducing a new class of designs called rotation designs. Rotation designs are found by taking an orthogonal starting design D and rotating it to obtain a new design matrix DR=DR, where R is any orthonormal matrix. The new design is still orthogonal for a first-order model. In this paper, we study some of the properties of rotation designs and we present a method to generate rotation designs that have some appealing symmetry properties.  相似文献   

15.
Cluster analysis is often used for market segmentation. When the inputs in the clustering algorithm are ranking data, the intersubject (dis)similarities must be measured by matching-type measures, able to take account of the ordinal nature of the data. Among them, we used a Weighted Spearman's rho, suitably transformed into a (dis)similarity measure, in order to emphasize the concordance on the top ranks. This allows creating clusters grouping customers that place the same items (products, services, etc.) higher in their rankings. Also the statistical instruments used to interpret the clusters must be conceived to deal with ordinal data. The median and other location measures are appropriate but not always able to clearly differentiate groups. The so-called bipolar mean, with its related variability measure, may reveal some additional features. A case study on real data from a survey carried out in the Italian McDonald's restaurants is presented.  相似文献   

16.
Uncertainty and sensitivity analysis is an essential ingredient of model development and applications. For many uncertainty and sensitivity analysis techniques, sensitivity indices are calculated based on a relatively large sample to measure the importance of parameters in their contributions to uncertainties in model outputs. To statistically compare their importance, it is necessary that uncertainty and sensitivity analysis techniques provide standard errors of estimated sensitivity indices. In this paper, a delta method is used to analytically approximate standard errors of estimated sensitivity indices for a popular sensitivity analysis method, the Fourier amplitude sensitivity test (FAST). Standard errors estimated based on the delta method were compared with those estimated based on 20 sample replicates. We found that the delta method can provide a good approximation for the standard errors of both first-order and higher-order sensitivity indices. Finally, based on the standard error approximation, we also proposed a method to determine a minimum sample size to achieve the desired estimation precision for a specified sensitivity index. The standard error estimation method presented in this paper can make the FAST analysis computationally much more efficient for complex models.  相似文献   

17.
The negative binomial (NB) model and the generalized Poisson (GP) model are common alternatives to Poisson models when overdispersion is present in the data. Having accounted for initial overdispersion, we may require further investigation as to whether there is evidence for zero-inflation in the data. Two score statistics are derived from the GP model for testing zero-inflation. These statistics, unlike Wald-type test statistics, do not require that we fit the more complex zero-inflated overdispersed models to evaluate zero-inflation. A simulation study illustrates that the developed score statistics reasonably follow a χ2 distribution and maintain the nominal level. Extensive simulation results also indicate the power behavior is different for including a continuous variable than a binary variable in the zero-inflation (ZI) part of the model. These differences are the basis from which suggestions are provided for real data analysis. Two practical examples are presented in this article. Results from these examples along with practical experience lead us to suggest performing the developed score test before fitting a zero-inflated NB model to the data.  相似文献   

18.
Summary.  We consider the analysis of extreme shapes rather than the more usual mean- and variance-based shape analysis. In particular, we consider extreme shape analysis in two applications: human muscle fibre images, where we compare healthy and diseased muscles, and temporal sequences of DNA shapes from molecular dynamics simulations. One feature of the shape space is that it is bounded, so we consider estimators which use prior knowledge of the upper bound when present. Peaks-over-threshold methods and maximum-likelihood-based inference are used. We introduce fixed end point and constrained maximum likelihood estimators, and we discuss their asymptotic properties for large samples. It is shown that in some cases the constrained estimators have half the mean-square error of the unconstrained maximum likelihood estimators. The new estimators are applied to the muscle and DNA data, and practical conclusions are given.  相似文献   

19.
The generalized Poisson (GP) regression model has been used to model count data that exhibit over-dispersion or under-dispersion. The zero-inflated GP (ZIGP) regression model can additionally handle count data characterized by many zeros. However, the parameters of ZIGP model cannot easily be used for inference on overall exposure effects. In order to address this problem, a marginalized ZIGP is proposed to directly model the population marginal mean count. The parameters of the marginalized zero-inflated GP model are estimated by the method of maximum likelihood. The regression model is illustrated by three real-life data sets.  相似文献   

20.
Quantification of uncertainties in code responses necessitates knowledge of input model parameter uncertainties. However, nuclear thermal-hydraulics code such as RELAP5 and TRACE do not provide any information on input model parameter uncertainties. Moreover, the input model parameters for physical models in these legacy codes were derived under steady-state flow conditions and hence might not be accurate to use in the analysis of transients without accounting for uncertainties. We present a Bayesian framework to estimate the posterior mode of input model parameters' mean and variance by implementing the iterative expectation–maximization algorithm. For this, we introduce the idea of model parameter multiplier. A log-normal transformation is used to transform the model parameter multiplier to pseudo-parameter. Our analysis is based on two main assumptions on pseudo-parameter. First, a first-order linear relationship is assumed between code responses and pseudo-parameters. Second, the pseudo-parameters are assumed to be normally distributed. The problem is formulated to express the scalar random variable, the difference between experimental result and base (nominal) code-calculated value as a linear combination of pseudo-parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号