首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Factor analysis as a tool for data analysis   总被引:1,自引:0,他引:1  
The use of factor analysis in analyzing real data is influenced not only by mathematical models but also by the objectives of the research at hand, the amount of data to be analyzed and the departures of the data from the model. Factor analysis is a process performed in several steps, including data screening and assessment of assumptions necessary for the model as well as the actual computations—the new analyst may need assistance in deter-mining the initial method of extraction, how many factors to request, the method of rotation and how to interpret the factors— these steps are discussed with reference to figures containing computer output for a real problem. The importance of auxiliary information and graphical displays to aid the statistician during the factor analysis process is stressed.  相似文献   

2.
Summary.  A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design.  相似文献   

3.
The evaluation of hazards from complex, large scale, technologically advanced systems often requires the construction of computer implemented mathematical models. These models are used to evaluate the safety of the systems and to evaluate the consequences of modifications to the systems. These evaluations, however, are normally surrounded by significant uncertainties related to the uncertainty inherent in natural phenomena such as the weather and those related to uncertainties in the parameters and models used in the evaluation.

Another use of these models is to evaluate strategies for improving information used in the modeling process itself. While sensitivity analysis is useful in defining variables in the model that are important, uncertainty analysis provides a tool for assessing the importance of uncertainty about these variables. A third complementary technique, is decision analysis. It provides a methodology for explicitly evaluating and ranking potential improvements to the model. Its use in the development of information gathering strategies for a nuclear waste repository are discussed in this paper.  相似文献   

4.
When a published statistical model is also distributed as computer software, it will usually be desirable to present the outputs as interval, as well as point, estimates. The present paper compares three methods for approximate interval estimation about a model output, for use when the model form does not permit an exact interval estimate. The methods considered are first-order asymptotics, using second derivatives of the log-likelihood to estimate variance information; higher-order asymptotics based on the signed-root transformation; and the non-parametric bootstrap. The signed-root method is Bayesian, and uses an approximation for posterior moments that has not previously been tested in a real-world application. Use of the three methods is illustrated with reference to a software project arising in medical decision-making, the UKPDS Risk Engine. Intervals from the first-order and signed-root methods are near- identical, and typically 1% wider to 7% narrower than those from the non-parametric bootstrap. The asymptotic methods are markedly faster than the bootstrap method.  相似文献   

5.
Running complex computer models can be expensive in computer time, while learning about the relationships between input and output variables can be difficult. An emulator is a fast approximation to a computationally expensive model that can be used as a surrogate for the model, to quantify uncertainty or to improve process understanding. Here, we examine emulators based on singular value decompositions (SVDs) and use them to emulate global climate and vegetation fields, examining how these fields are affected by changes in the Earth's orbit. The vegetation field may be emulated directly from the orbital variables, but an appealing alternative is to relate it to emulations of the climate fields, which involves high-dimensional input and output. The SVDs radically reduce the dimensionality of the input and output spaces and are shown to clarify the relationships between them. The method could potentially be useful for any complex process with correlated, high-dimensional inputs and/or outputs.  相似文献   

6.
Using tests of time reversibility, this paper provides further statistical evidence on the long-standing conjecture in economics concerning the potentially asymmetric behaviour of output over the expansionary and contractionary phases of the business cycle. A particular advantage of this approach is that it provides a discriminating test that is instructive as to whether any asymmetries detected are due to asymmetric shocks to a linear model, or an underlying non-linear model with symmetric shocks, and in the latter case is informative as to the potential form of that nonlinear model. Using a long span of international per capita output growth data, the asymmetry detected is overwhelmingly consistent with the long standing perception that the output business cycle is characterized by steeper recessions and longer more gentle expansions, but the evidence for this form of business cycle asymmetry is weaker in the data adjusted for the influence of outliers associated with wars and other extreme events. Statistically significant time irreversibility is reported for the output growth rates of almost all of the countries considered in the full sample data, and there is evidence that this time irreversibility is of a form implying an underlying nonlinear model with symmetrically distributed innovations for 15 of the 22 countries considered. However, the time irreversibility test results for the outlier-trimmed full sample data reveal significant time irreversibility in output growth for around one half of the countries considered, predominantly in Northern Europe and North America, and of a form implying a nonlinear underlying model in only a further half of those cases.  相似文献   

7.
Probabilistic sensitivity analysis of complex models: a Bayesian approach   总被引:3,自引:0,他引:3  
Summary.  In many areas of science and technology, mathematical models are built to simulate complex real world phenomena. Such models are typically implemented in large computer programs and are also very complex, such that the way that the model responds to changes in its inputs is not transparent. Sensitivity analysis is concerned with understanding how changes in the model inputs influence the outputs. This may be motivated simply by a wish to understand the implications of a complex model but often arises because there is uncertainty about the true values of the inputs that should be used for a particular application. A broad range of measures have been advocated in the literature to quantify and describe the sensitivity of a model's output to variation in its inputs. In practice the most commonly used measures are those that are based on formulating uncertainty in the model inputs by a joint probability distribution and then analysing the induced uncertainty in outputs, an approach which is known as probabilistic sensitivity analysis. We present a Bayesian framework which unifies the various tools of prob- abilistic sensitivity analysis. The Bayesian approach is computationally highly efficient. It allows effective sensitivity analysis to be achieved by using far smaller numbers of model runs than standard Monte Carlo methods. Furthermore, all measures of interest may be computed from a single set of runs.  相似文献   

8.
The aim of this paper is to develop a general, unified approach, based on some partial estimation functions which we call “Z-process”, to some change point problems in mathematical statistics. The method proposed can be applied not only to ergodic models but also to some models where the Fisher information matrix is random. Applications to some concrete models, including a parametric model for volatilities of diffusion processes are presented. Simulations for randomly time-transformed Brownian bridge process appearing as the limit of the proposed test statistics are performed with computer intensive use.  相似文献   

9.
Statistical models are sometimes incorporated into computer software for making predictions about future observations. When the computer model consists of a single statistical model this corresponds to estimation of a function of the model parameters. This paper is concerned with the case that the computer model implements multiple, individually-estimated statistical sub-models. This case frequently arises, for example, in models for medical decision making that derive parameter information from multiple clinical studies. We develop a method for calculating the posterior mean of a function of the parameter vectors of multiple statistical models that is easy to implement in computer software, has high asymptotic accuracy, and has a computational cost linear in the total number of model parameters. The formula is then used to derive a general result about posterior estimation across multiple models. The utility of the results is illustrated by application to clinical software that estimates the risk of fatal coronary disease in people with diabetes.  相似文献   

10.
In this article statistical inference is viewed as information processing involving input information and output information. After introducing information measures for the input and output information, an information criterion functional is formulated and optimized to obtain an optimal information processing rule (IPR). For the particular information measures and criterion functional adopted, it is shown that Bayes's theorem is the optimal IPR. This optimal IPR is shown to be 100% efficient in the sense that its use leads to the output information being exactly equal to the given input information. Also, the analysis links Bayes's theorem to maximum-entropy considerations.  相似文献   

11.
Space filling designs are important for deterministic computer experiments. Even a single experiment can be very time consuming and can have many input parameters. Furthermore the underlying function generating the output is often nonlinear. Thus, the computer experiment has to be designed carefully. There exist many design criteria, which can be numerically optimized. Here, a method is developed, which does not need algorithmic optimization. A mesh of nearly regular simplices is constructed and the vertices of the simplices are used as potential design points. The extraction of a design from these meshes is very fast and easy to implement once the underlying mesh has been constructed. The extracted designs are highly competitive regarding the maximin design criterion and it is easy to extract designs for nonstandard design spaces.  相似文献   

12.
We propose a method that uses a sequential design instead of a space filling design for estimating tuning parameters of a complex computer model. The goal is to bring the computer model output closer to the real system output. The method fits separate Gaussian process (GP) models to the available data from the physical experiment and the computer experiment and minimizes the discrepancy between the predictions from the GP models to obtain estimates of the tuning parameters. A criterion based on the discrepancy between the predictions from the two GP models and the standard error of prediction for the computer experiment output is then used to obtain a design point for the next run of the computer experiment. The tuning parameters are re-estimated using the augmented data set. The steps are repeated until the budget for the computer experiment data is exhausted. Simulation studies show that the proposed method performs better in bringing a computer model closer to the real system than methods that use a space filling design.  相似文献   

13.
College students majoring in science and engineering need to learn how to model key features of the driving mechanisms of natural, scientific, and engineering phenomena. A rigorous treatment of these topics requires a thorough understanding of advanced mathematical concepts and probability theory. However, we believe that carefully designed computer simulation software offers a means of conveying fundamental ideas of probabilistic modeling, while minimizing the need for underlying mathematical analyses. Based on this premise we have initiated the development of a software system that will be incorporated into a novel, introductory course in probabilistic modeling for undergraduate students in the biological and environmental sciences. In this paper we describe the preliminary version of our system that implements simulation, real-time animation, and calculations of dynamic statistical summaries for several prototypical stochastic models for a variety of biological systems.  相似文献   

14.
Nowadays it is common to reproduce physical systems using mathematical simulation models and, despite the fact that computing resources continue to increase, computer simulations are growing in complexity. This leads to the adoption of surrogate models and one of the most popular methodologies is the well-known Ordinary Kriging, which is a statistical interpolator extensively used to approximate the output of deterministic simulation. This paper deals with the problem of finding suitable experimental plans for the Ordinary Kriging with exponential correlation structure. In particular, we derive exact optimal designs for prediction, estimation and information gain approaches in the one-dimensional case, giving further theoretical justifications for the adoption of the equidistant design. Moreover, we show that in some circumstances several results related to the uncorrelated setup still hold for correlated observations.  相似文献   

15.
In this paper a semi-parametric approach is developed to model non-linear relationships in time series data using polynomial splines. Polynomial splines require very little assumption about the functional form of the underlying relationship, so they are very flexible and can be used to model highly non-linear relationships. Polynomial splines are also computationally very efficient. The serial correlation in the data is accounted for by modelling the noise as an autoregressive integrated moving average (ARIMA) process, by doing so, the efficiency in nonparametric estimation is improved and correct inferences can be obtained. The explicit structure of the ARIMA model allows the correlation information to be used to improve forecasting performance. An algorithm is developed to automatically select and estimate the polynomial spline model and the ARIMA model through backfitting. This method is applied on a real-life data set to forecast hourly electricity usage. The non-linear effect of temperature on hourly electricity usage is allowed to be different at different hours of the day and days of the week. The forecasting performance of the developed method is evaluated in post-sample forecasting and compared with several well-accepted models. The results show the performance of the proposed model is comparable with a long short-term memory deep learning model.  相似文献   

16.
What explains the sharp movements of the yield curve upon the release of major U.S. macroeconomic announcements? To answer this question, we estimate an arbitrage-free dynamic term structure model with macroeconomic fundamentals as risk factors. We assume that the yield curve reacts to announcements primarily because of the information they contain about the fundamentals of output, inflation, and the Fed’s inflation target. We model the updating process by linking the factor shocks to announcement surprises. Fitting this process to data on yield curve movements in 20-min event windows, we find that most major announcements, especially those about the labor market, are informative largely about the output gap rather than about inflation. The resulting changes in short-rate expectations account for the bulk of observed yield movements. But adjustments in risk premia are also sizable. In partly offsetting the effects of short-rate expectations, these adjustments help to account for the well-known hump-shaped pattern of yield reactions across maturities.  相似文献   

17.
Recently, a body of literature proposed new models relaxing a widely-used but controversial assumption of independence between claim frequency and severity in non-life insurance rate making. This paper critically reviews a generalized linear model approach, where a dependence between claim frequency and severity is introduced by treating frequency as a covariate in a regression model for severity. As an extension of this approach, we propose a dispersion model for severity. For this model, the information loss caused by using average severity rather than individual severity is examined in detail and the parameter estimators suffering from low efficiency are identified. We also provide analytical solutions for the aggregate sum to help rate making. We show that the simple functional form used in current research may not properly reflect the real underlying dependence structure. A real data analysis is given to explain our analytical findings.  相似文献   

18.
In this paper we provide a broad introduction to the topic of computer experiments. We begin by briefly presenting a number of applications with different types of output or different goals. We then review modelling strategies, including the popular Gaussian process approach, as well as variations and modifications. Other strategies that are reviewed are based on polynomial regression, non-parametric regression and smoothing spline ANOVA. The issue of multi-level models, which combine simulators of different resolution in the same experiment, is also addressed. Special attention is given to modelling techniques that are suitable for functional data. To conclude the modelling section, we discuss calibration, validation and verification. We then review design strategies including Latin hypercube designs and space-filling designs and their adaptation to computer experiments. We comment on a number of special issues, such as designs for multi-level simulators, nested factors and determination of experiment size.  相似文献   

19.
Bayesian calibration of computer models   总被引:5,自引:0,他引:5  
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.  相似文献   

20.
Summary.  On-line auctions pose many challenges for the empirical researcher, one of which is the effective and reliable modelling of price paths. We propose a novel way of modelling price paths in eBay's on-line auctions by using functional data analysis. One of the practical challenges is that the functional objects are sampled only very sparsely and unevenly. Most approaches rely on smoothing to recover the underlying functional object from the data, which can be difficult if the data are irregularly distributed. We present a new approach that can overcome this challenge. The approach is based on the ideas of mixed models. Specifically, we propose a semiparametric mixed model with boosting to recover the functional object. As well as being able to handle sparse and unevenly distributed data, the model also results in conceptually more meaningful functional objects. In particular, we motivate our method within the framework of eBay's on-line auctions. On-line auctions produce monotonic increasing price curves that are often correlated across auctions. The semiparametric mixed model accounts for this correlation in a parsimonious way. It also manages to capture the underlying monotonic trend in the data without imposing model constraints. Our application shows that the resulting functional objects are conceptually more appealing. Moreover, when used to forecast the outcome of an on-line auction, our approach also results in more accurate price predictions compared with standard approaches. We illustrate our model on a set of 183 closed auctions for Palm M515 personal digital assistants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号