期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using the Fast Fourier Transform to Compute Multiple Comparisons With the Best and Subset Selection Critical Values

Jason C. Hsu W. C. Soong 《统计学通讯:模拟与计算》2013,42(4):1377-1391

Multiple comparison methods are widely implemented in statistical packages and heavily used. To obtain the critical value of a multiple comparison method for a given confidence level, a double integral equation must be solved. Current computer implementations evaluate one double integral for each candidate critical value using Gaussian quadrature. Consequently, iterative refinement of the critical value can slow the response time enough to hamper interactive data analysis. However, for balanced designs, to obtain the critical value for multiple comparisons with the best, subset selection, and one-sided multiple comparison with a control, if one regards the inner integral as a function of the outer integration variable, then this function can be obtained by discrete convolution using the Fast Fourier Transform (FFT). Exploiting the fact that this function need not be re-evaluated during iterative refinement of the critical value, it is shown that the FFT method obtains critical values at least four times as accurate and two to five times as fast as the Gaussian quadrature method. 相似文献

2.

Computer experiments: a review

Sigal Levy David M. Steinberg 《AStA Advances in Statistical Analysis》2010,94(4):311-324

In this paper we provide a broad introduction to the topic of computer experiments. We begin by briefly presenting a number of applications with different types of output or different goals. We then review modelling strategies, including the popular Gaussian process approach, as well as variations and modifications. Other strategies that are reviewed are based on polynomial regression, non-parametric regression and smoothing spline ANOVA. The issue of multi-level models, which combine simulators of different resolution in the same experiment, is also addressed. Special attention is given to modelling techniques that are suitable for functional data. To conclude the modelling section, we discuss calibration, validation and verification. We then review design strategies including Latin hypercube designs and space-filling designs and their adaptation to computer experiments. We comment on a number of special issues, such as designs for multi-level simulators, nested factors and determination of experiment size. 相似文献

3.

Factor screening in nonregular two-level designs based on projection-based variable selection

John Tyssedal Shahrukh Hussain 《Journal of applied statistics》2016,43(3):490-508

In this paper, we focus on the problem of factor screening in nonregular two-level designs through gradually reducing the number of possible sets of active factors. We are particularly concerned with situations when three or four factors are active. Our proposed method works through examining fits of projection models, where variable selection techniques are used to reduce the number of terms. To examine the reliability of the methods in combination with such techniques, a panel of models consisting of three or four active factors with data generated from the 12-run and the 20-run Plackett–Burman (PB) design is used. The dependence of the procedure on the amount of noise, the number of active factors and the number of experimental factors is also investigated. For designs with few runs such as the 12-run PB design, variable selection should be done with care and default procedures in computer software may not be reliable to which we suggest improvements. A real example is included to show how we propose factor screening can be done in practice. 相似文献

4.

A comparison of algorithms for exact goodness-of-fit tests for multinomial data

Karim F. Hirji 《统计学通讯:模拟与计算》2013,42(3):1197-1227

Multinomial goodness-of-fit tests arise in a diversity of milieu. The long history of the problem has spawned a multitude of asymptotic tests. If the sample size relative to the number of categories is small, the accuracy of these tests is compromised. In that case, an exact test is a prudent option. But such tests are computationally intensive and need efficient algorithms. This paper gives a conceptual overview, and empirical comparisons of two avenues, namely the network and fast Fourier transform (FFT) algorithms, for an exact goodness-of-fit test on a multinomial. We show that a recursive execution of a polynomial product forms the basis of both these approaches. Specific details to implement the network method, and techniques to enhance the efficiency of the FFT algorithm are given. Our empirical comparisons show that for exact analysis with the chi-square and likelihood ratio statistics, the network-cum-polynomial multiplication algorithm is the more efficient and accurate of the two. 相似文献

5.

An Interactive Forecasting System

Spyros Makridakis Anne Hodgsdon Steven C. Wheelwright 《The American statistician》2013,67(4):153-158

Time sharing computer configurations have introduced a new dimension in applying statistical and mathematical models to sequential decision problems. When the outcome of one step in the process influences subsequent decisions, then an interactive time-sharing system is of great help. Since the forecasting function involves such a sequential process, it can be handled particularly well with an appropriate time-shared computer system. This paper describes such as system which allows the user to do preliminary analysis of his data to identify the forecasting technique or class of techniques most appropriate for his situation and to apply those in developing a forecast. This interactive forecasting system has met with excellent success both in teaching the fundamentals of forecasting for business decision making and in actually applying those techniques in management situations. 相似文献

6.

Spatial dependence estimation using FFT of biased covariances

Rubén Fernández-Casal Rosa M. Crujeiras 《Journal of statistical planning and inference》2010

One of the main problems in geostatistics is fitting a valid variogram or covariogram model in order to describe the underlying dependence structure in the data. The dependence between observations can be also modeled in the spectral domain, but the traditional methods based on the periodogram as an estimator of the spectral density may present some problems for the spatial case. In this work, we propose an estimation method for the covariogram parameters based on the fast Fourier transform (FFT) of biased covariances. The performance of this estimator for finite samples is compared through a simulation study with other classical methods stated in spatial domain, such as weighted least squares and maximum likelihood, as well as with other spectral estimators. Additionally, an example of application to real data is given. 相似文献

7.

Genetic programming as a means for programming computers by natural selection 总被引：4，自引：0，他引：4

John R. Koza 《Statistics and Computing》1994,4(2):87-112

Many seemingly different problems in machine learning, artificial intelligence, and symbolic processing can be viewed as requiring the discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a highly fit individual computer program. The recently developed genetic programming paradigm described herein provides a way to search the space of possible computer programs for a highly fit individual computer program to solve (or approximately solve) a surprising variety of different problems from different fields. In genetic programming, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (sexual recombination) operator appropriate for genetically mating computer programs. Genetic programming is illustrated via an example of machine learning of the Boolean 11-multiplexer function and symbolic regression of the econometric exchange equation from noisy empirical data.Hierarchical automatic function definition enables genetic programming to define potentially useful functions automatically and dynamically during a run, much as a human programmer writing a complex computer program creates subroutines (procedures, functions) to perform groups of steps which must be performed with different instantiations of the dummy variables (formal parameters) in more than one place in the main program. Hierarchical automatic function definition is illustrated via the machine learning of the Boolean 11-parity function. 相似文献

8.

Branch and bound algorithms for maximizing expected improvement functions

Mark Franey Pritam Ranjan Hugh Chipman 《Journal of statistical planning and inference》2011,141(1):42-55

Deterministic computer simulations are often used as replacement for complex physical experiments. Although less expensive than physical experimentation, computer codes can still be time-consuming to run. An effective strategy for exploring the response surface of the deterministic simulator is the use of an approximation to the computer code, such as a Gaussian process (GP) model, coupled with a sequential sampling strategy for choosing design points that can be used to build the GP model. The ultimate goal of such studies is often the estimation of specific features of interest of the simulator output, such as the maximum, minimum, or a level set (contour). Before approximating such features with the GP model, sufficient runs of the computer simulator must be completed.Sequential designs with an expected improvement (EI) design criterion can yield good estimates of the features with minimal number of runs. The challenge is that the expected improvement function itself is often multimodal and difficult to maximize. We develop branch and bound algorithms for efficiently maximizing the EI function in specific problems, including the simultaneous estimation of a global maximum and minimum, and in the estimation of a contour. These branch and bound algorithms outperform other optimization strategies such as genetic algorithms, and can lead to significantly more accurate estimation of the features of interest. 相似文献

9.

On assessing the presence of evaluation‐time bias in progression‐free survival in randomized trials

Richard Kay Jane Wu Janet Wittes 《Pharmaceutical statistics》2011,10(3):213-217

Evaluation (or assessment)–time bias can arise in oncology trials that study progression‐free survival (PFS) when randomized groups have different patterns of timing of assessments. Modelling or computer simulation is sometimes used to explore the extent of such bias; valid results require building such simulations under realistic assumptions concerning the timing of assessments. This paper considers a trial that used a logrank test where computer simulations were based on unrealistic assumptions that severely overestimated the extent of potential bias. The paper shows that seemingly small differences in assumptions can lead to dramatic differences in the apparent operating characteristics of logrank tests. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

10.

Small sample sensitivity analysis techniques for computer models.with an application to risk assessment

Ronald L. Iman W.J. Conover 《统计学通讯:理论与方法》2013,42(17):1749-1842

As modeling efforts expand to a broader spectrum of areas the amount of computer time required to exercise the corresponding computer codes has become quite costly (several hours for a single run is not uncommon). This costly process can be directly tied to the complexity of the modeling and to the large number of input variables (often numbering in the hundreds) Further, the complexity of the modeling (usually involving systems of differential equations) makes the relationships among the input variables not mathematically tractable. In this setting it is desired to perform sensitivity studies of the input-output relationships. Hence, a judicious selection procedure for the choic of values of input variables is required, Latin hypercube sampling has been shown to work well on this type of problem.

However, a variety of situations require that decisions and judgments be made in the face of uncertainty. The source of this uncertainty may be lack ul knowledge about probability distributions associated with input variables, or about different hypothesized future conditions, or may be present as a result of different strategies associated with a decision making process In this paper a generalization of Latin hypercube sampling is given that allows these areas to be investigated without making additional computer runs. In particular it is shown how weights associated with Latin hypercube input vectors may be rhangpd to reflect different probability distribution assumptions on key input variables and yet provide: an unbiased estimate of the cumulative distribution function of the output variable. This allows for different distribution assumptions on input variables to be studied without additional computer runs and without fitting a response surface. In addition these same weights can be used in a modified nonparametric Friedman test to compare treatments, Sample size requirements needed to apply the results of the work are also considered. The procedures presented in this paper are illustrated using a model associated with the risk assessment of geologic disposal of radioactive waste. 相似文献

11.

Automating argument construction

John D. Lawrence 《Journal of statistical planning and inference》1988,20(3):369-387

Over the past five years the Artificial Intelligence Center at SRI has been developing a new technology to address the problem of automated information management within real- world contexts. The result of this work is a body of techniques for automated reasoning from evidence that we call evidential reasoning. The techniques are based upon the mathematics of belief functions developed by Dempster and Shafer and have been successfully applied to a variety of problems including computer vision, multisensor integration, and intelligence analysis.

We have developed both a formal basis and a framework for implementating automated reasoning systems based upon these techniques. Both the formal and practical approach can be divided into four parts: (1) specifying a set of distinct propositional spaces, (2) specifying the interrelationships among these spaces, (3) representing bodies of evidence as belief distributions, and (4) establishing paths of the bodies for evidence to move through these spaces by means of evidential operations, eventually converging on spaces where the target questions can be answered. These steps specify a means for arguing from multiple bodies of evidence toward a particular (probabilistic) conclusion. Argument construction is the process by which such evidential analyses are constructed and is the analogue of constructing proof trees in a logical context.

This technology features the ability to reason from uncertain, incomplete, and occasionally inaccurate information based upon seven evidential operations: fusion, discounting, translation, projection, summarization, interpretation, and gisting. These operation are theoretically sound but have intuitive appeal as well.

In implementing this formal approach, we have found that evidential arguments can be represented as graphs. To support the construction, modification, and interrogation of evidential arguments, we have developed Gister. Gister provides an interactive, menu-driven, graphical interface that allows these graphical structures to be easily manipulated.

Our goal is to provide effective automated aids to domain experts for argument construction. Gister represents our first attempt at such an aid. 相似文献

12.

Cases for the nugget in modeling computer experiments

Robert B. Gramacy Herbert K. H. Lee 《Statistics and Computing》2012,22(3):713-722

Most surrogate models for computer experiments are interpolators, and the most common interpolator is a Gaussian process (GP) that deliberately omits a small-scale (measurement) error term called the nugget. The explanation is that computer experiments are, by definition, “deterministic”, and so there is no measurement error. We think this is too narrow a focus for a computer experiment and a statistically inefficient way to model them. We show that estimating a (non-zero) nugget can lead to surrogate models with better statistical properties, such as predictive accuracy and coverage, in a variety of common situations. 相似文献

13.

Exact or approximate inference in graphical models: why the choice is dictated by the treewidth,and how variable elimination can be exploited

N. Peyrard M.‐J. Cros S. de Givry A. Franc S. Robin R. Sabbadin T. Schiex M. Vignes 《Australian & New Zealand Journal of Statistics》2019,61(2):89-133

Probabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which is represented as a graph. However, the dependence between variables may render inference tasks intractable. In this paper, we review techniques exploiting the graph structure for exact inference, borrowed from optimisation and computer science. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated. The so‐called treewidth of the graph characterises this algorithmic complexity: low‐treewidth graphs can be processed efficiently. The first point that we illustrate is therefore the idea that for inference in graphical models, the number of variables is not the limiting factor, and it is worth checking the width of several tree decompositions of the graph before resorting to the approximate method. We show how algorithms providing an upper bound of the treewidth can be exploited to derive a ‘good' elimination order enabling to realise exact inference. The second point is that when the treewidth is too large, algorithms for approximate inference linked to the principle of variable elimination, such as loopy belief propagation and variational approaches, can lead to accurate results while being much less time consuming than Monte‐Carlo approaches. We illustrate the techniques reviewed in this article on benchmarks of inference problems in genetic linkage analysis and computer vision, as well as on hidden variables restoration in coupled Hidden Markov Models. 相似文献

14.

Reliability evaluation subject to assured accuracy rate and time for stochastic unreliable-node computer networks

《Journal of Statistical Computation and Simulation》2012,82(7):1530-1542

In many real-life networks such as computer networks, branches and nodes have multi-state capacity, lead time, and accuracy rate. The network with unreliable nodes is more complex to evaluate the reliability because node failure results in the disabled of adjacent branches. Such a network is named a stochastic unreliable-node computer network (SUNCN). Under the strict assumption that each component (branch and node) has a deterministic capacity, the quickest path (QP) problem is to find a path sending a specific amount of data with minimum transmission time. The accuracy rate is a critical index to measure the performance of a computer network because some packets are damaged or lost due to voltage instability, magnetic field effects, lightning, etc. Subject to both assured accuracy rate and time constraints, this paper extends the QP problem to discuss the system reliability of an SUNCN. An efficient algorithm based on a graphic technique is proposed to find the minimal capacity vector meeting such constraints. System reliability, the probability to send a specific amount of data through multiple minimal paths subject to both assured accuracy rate and time constraints, can subsequently be computed. 相似文献

15.

Sequential estimation in variable length computerized adaptive testing

《Journal of statistical planning and inference》2004,121(2):249-264

With the advent of modern computer technology, there have been growing efforts in recent years to computerize standardized tests, including the popular Graduate Record Examination (GRE), the Graduate Management Admission Test (GMAT) and the Test of English as a Foreign Language (TOEFL). Many of such computer-based tests are known as the computerized adaptive tests, a major feature of which is that, depending on their performance in the course of testing, different examinees may be given with different sets of items (questions). In doing so, items can be efficiently utilized to yield maximum accuracy for estimation of examinees’ ability traits. We consider, in this article, one type of such tests where test lengths vary with examinees to yield approximately same predetermined accuracy for all ability traits. A comprehensive large sample theory is developed for the expected test length and the sequential point and interval estimates of the latent trait. Extensive simulations are conducted with results showing that the large sample approximations are adequate for realistic sample sizes. 相似文献

16.

The Modelling of Ethernet Data and of Signals that are Heavy‐tailed with Infinite Variance*

MURAD S. TAQQU 《Scandinavian Journal of Statistics》2002,29(2):273-295

We provide an overview of some of the research of the last ten years involving computer network data traffic. We describe the original Ethernet data study which suggested that computer traffic is inherently different from telephone traffic and that in the context of computer networks, self‐similar models such as fractional Brownian motion, should be used. We show that the on–off model can physically explain the presence of self‐similarity. While the on–off model involves bounded signals, it is also possible to consider arbitrary unbounded finite‐variance signals or even infinite‐variance signals whose distributions have heavy tails. We show that, in the latter case, one can still obtain self‐similar processes with dependent increments, but these are not the infinite‐variance fractional stable Lévy motions which have been commonly considered in the literature. The adequate model, in fact, can either have dependent or independent increments, and this depends on the respective size of two parameters, namely, the number of workstations in the network and the time scale under consideration. We indicate what happens when these two parameters become jointly asymptotically large. We conclude with some comments about high frequency behaviour and multifractals. 相似文献

17.

On using non identical and independent Bernoulli trials in human surveys

Zahid Pervez Zawar Hussain Maryam Hafeez 《统计学通讯:理论与方法》2018,47(8):1856-1867

Unlike the usual randomized response techniques, as a pioneering attempt, this article focuses on using non identical independent Bernoulli trials in sensitive surveys. For this purpose, a general class of randomized response techniques is considered. The usual randomized response techniques are based on a fixed probability of having a yes answer. Contrary to usual techniques, in the proposed technique every respondent has a different probability of reporting a yes answer. With this setting, in most of the situations, the proposed technique is observed performing better in terms of variability. To illustrate and support the superiority of the proposed technique it is compared with models such as Warner (1965), Greenberg et al. (1969), Mangat and Singh (1990), and Mangat (1994) using identical Bernoulli trials. Relative efficiency and privacy protection are studied in detail using Warner (1965) and Mangat (1994) models. 相似文献

18.

A geometric approach to transdimensional markov chain monte carlo

Giovanni Petris Luca Tardella 《Revue canadienne de statistique》2003,31(4):469-482

The authors present theoretical results that show how one can simulate a mixture distribution whose components live in subspaces of different dimension by reformulating the problem in such a way that observations may be drawn from an auxiliary continuous distribution on the largest subspace and then transformed in an appropriate fashion. Motivated by the importance of enlarging the set of available Markov chain Monte Carlo (MCMC) techniques, the authors show how their results can be fruitfully employed in problems such as model selection (or averaging) of nested models, or regeneration of Markov chains for evaluating standard deviations of estimated expectations derived from MCMC simulations. 相似文献

19.

Inverse regression for ridge recovery: a data-driven approach for parameter reduction in computer experiments

Glaws Andrew Constantine Paul G. Cook R. Dennis 《Statistics and Computing》2020,30(2):237-253

Parameter reduction can enable otherwise infeasible design and uncertainty studies with modern computational science models that contain several input parameters. In statistical regression, techniques for sufficient dimension reduction (SDR) use data to reduce the predictor dimension of a regression problem. A computational scientist hoping to use SDR for parameter reduction encounters a problem: a computer prediction is best represented by a deterministic function of the inputs, so data comprised of computer simulation queries fail to satisfy the SDR assumptions. To address this problem, we interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function. Within this interpretation, SIR and SAVE estimate matrices of integrals whose column spaces are contained in the ridge directions’ span; we analyze and numerically verify convergence of these column spaces as the number of computer model queries increases. Moreover, we show example functions that are not ridge functions but whose inverse conditional moment matrices are low-rank. Consequently, the computational scientist should beware when using SIR and SAVE for parameter reduction, since SIR and SAVE may mistakenly suggest that truly important directions are unimportant.

相似文献

20.

Network reliability for multipath TCP networks with a retransmission mechanism under the time constraint

Yi-Kuei Lin Chih-Li Pan Louis Cheng-Lu Yeng 《Journal of Statistical Computation and Simulation》2018,88(12):2273-2286

It is essential to reduce data latency and guarantee quality of service for modern computer networks. The emerging networking protocol, Multipath Transmission Control Protocol, can reduce data latency by transmitting data through multiple minimal paths (MPs) and ensure data integrity by the packets retransmission mechanism. The bandwidth of each edge can be considered as multi-state in computer networks because different situations, such as failures, partial failures and maintenance, exist. We evaluate network reliability for a multi-state retransmission flow network through which the data can be successfully transmitted by means of multiple MPs under the time constraint. By generating all minimal bandwidth patterns, the proposed algorithm can satisfy these requirements to calculate network reliability. An example and a practical case of the Pan-European Research and Education Network are applied to demonstrate the proposed algorithm. 相似文献