首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
The evolution of computers is currently in a period of rapid change, stimulated by radically cheaper and smaller devices for processing and memory. These changes are certain to provide major opportunities and challenges for the use of computers in statistics. This article looks at history and current trends, in both general computing and statistical computing, with the goal of identifying key features and requirements for the near future. A discussion of the S language developed at Bell Laboratories illustrates some program design principles that can make future work on statistical programs more effective and more valuable.  相似文献   

2.
BENN  A.  KULPERGER  R. 《Statistics and Computing》1998,8(4):309-318
Massively parallel computing is a computing environment with thousands of subprocessors. It requires some special programming methods, but is well suited to certain imaging problems. One such statistical example is discussed in this paper. In addition there are other natural statistical problems for which this technology is well suited. This paper describes our experience, as statisticians, with a massively parallel computer in a problem of image correlation spectroscopy. Even with this computing environment some direct computations would still take in the order of a year to finish. It is shown that some of the algorithms of interest can be made parallel.  相似文献   

3.
Parallel bootstrap is an extremely useful statistical method with good performance. In the present study, we introduce a working correlation matrix on the method, which is called parallel bootstrap matrix. We consider some properties of it and the optimal size of the subsample in smooth function models. We also present some performance results of parallel bootstrap estimators, the subsample length selection on the finance time series data.  相似文献   

4.
基于计算机集群技术的经济数据处理系统构建   总被引:1,自引:0,他引:1  
计算机集群系统是互相连接的多个独立计算机的集合。在利用计算机集群技术构建基于并行计算的统计分析与经济数据处理系统的基础上,着重论述集群计算技术在经济数据处理系统中的应用、运行方式及性能评价方法,并以北京经济数据处理与计算机仿真实验室为例,构造基于集群并行计算技术的经济数据信息系统。  相似文献   

5.
A significant challenge in fitting metamodels of large-scale simulations with sufficient accuracy is in the computational time required for rigorous statistical validation. This paper addresses the statistical computation issues associated with the Bootstrap and modified PRESS statistic, which yield key metrics for error measurements in metamodelling validation. Experimentation is performed on different programming languages, namely, MATLAB, R, and Python, and implemented on different computing architectures including traditional multicore personal computers and high-power clusters with parallel computing capabilities. This study yields insight into the effect that programming languages and computing architecture have on the computational time for simulation metamodel validation. The experimentation is performed across two scenarios with varying complexity.  相似文献   

6.
A class of computing devices known as desktop computers has emerged over the last several years. The International Data Corporation (McGovern 1980) estimates that the number of desktop computers is increasing by approximately 53,000 each month. Because of the projected widespread use of desktop computers and anticipated improvements in hardware, the potential for impressive statistical computing on these devices is exciting. Two features of desktop computers will be particularly important for those doing statistical analyses: (a) the ease-of-use of the computers, and (b) their extensive graphics capabilities. The author suggests that sophisticated statistical software will be available in the near future on many different models of desktop computers. Indeed, several of the manufacturers provide high-quality software at the present time. The implications for statisticians of a rapid growth rate for desktop computers are discussed for data analysis, software development, graphics, and instructional usage.  相似文献   

7.
A substantial fraction of the statistical analyses and in particular statistical computing is done under the heading of multiple linear regression. That is the fitting of equations to multivariate data using the least squares technique for estimating parameters The optimality properties of these estimates are described in an ideal setting which is not often realized in practice.

Frequently, we do not have "good" data in the sense that the errors are non-normal or the variance is non-homogeneous. The data may contain outliers or extremes which are not easily detectable but variables in the proper functional, and we. may have the linearity

Prior to the mid-sixties regression programs provided just the basic least squares computations plus possibly a step-wise algorithm for variable selection. The increased interest in regression prompted by dramatic improvements in computers has led to a vast amount of literatur describing alternatives to least squares improved variable selection methods and extensive diagnostic procedures

The purpose of this paper is to summarize and illustrate some of these recent developments. In particular we shall review some of the potential problems with regression data discuss the statistics and techniques used to detect these problems and consider some of the proposed solutions. An example is presented to illustrate the effectiveness of these diagnostic methods in revealing such problems and the potential consequences of employing the proposed methods.  相似文献   

8.
Research and operational applications in weather forecasting are reviewed, with emphasis on statistical issues. It is argued that the deterministic approach has dominated in weather forecasting, although weather forecasting is a probabilistic problem by nature. The reason has been the successful application of numerical weather prediction techniques over the 50 years since the introduction of computers. A gradual change towards utilization of more probabilistic methods has occurred over the last decade; in particular meteorological data assimilation, ensemble forecasting and post‐processing of model output have been influenced by ideas from statistics and control theory.  相似文献   

9.
Conventional computations use real numbers as input and produce real numbers as results without any indication of the accuracy. Interval analysis, instead, uses interval elements throughout the computation and produces intervals as output with the guarantee that the true results are contained in them. One major use for interval analysis in statistics is to get results of high-dimensional multivariate probabilities. With the efforts to decrease the length of the intervals that contain the theoretically true answers, we can obtain results to any arbitrary accuracy, which is demonstrated by multivariate normal and multivariate t integrations. This is an advantage over the approximation methods that are currently in use. Since interval analysis is more computationally intensive than traditional computing, a MasPar parallel computer is used in this research to improve performance.  相似文献   

10.
Numerical methods are needed to obtain maximum-likelihood estimates (MLEs) in many problems. Computation time can be an issue for some likelihoods even with modern computing power. We consider one such problem where the assumed model is a random-clumped multinomial distribution. We compute MLEs for this model in parallel using the Toolkit for Advanced Optimization software library. The computations are performed on a distributed-memory cluster with low latency interconnect. We demonstrate that for larger problems, scaling the number of processes improves wall clock time significantly. An illustrative example shows how parallel MLE computation can be useful in a large data analysis. Our experience with a direct numerical approach indicates that more substantial gains may be obtained by making use of the specific structure of the random-clumped model.  相似文献   

11.
It is well known that the availability of cost-effective and powerful parallel computers has enhanced the ability of the operations research community to solve laborious computational problems. But many researchers argue that the lack of portability of parallel algorithms is a major drawback to utilizing parallel computers. This paper studies the performance of a portable parallel unconstrained non-gradient optimization algorithm, when executed in various shared-memory multiprocessor systems, compared with its non-portable code. Analysis of covariance is used to analyse how the algorithm's performance is affected by several factors of interest. The results yield more insights into the parallel computing.  相似文献   

12.
Statisticians fall far short of their potential as guides to enlightened decision making in business. Two important explanations are: (1) Decision makers are often more easily convinced by concrete examples, however fragmentary and misleading, than by competent statistical analysis. (2) The effective use of statistics in the process of decision making requires hard thinking by decision makers, thinking that cannot be delegated entirely to the statistical specialist. Modern developments in interactive statistical computing may help to reduce the force of these limitations on exploitation of statistics; used properly, computing can encourage, almost force, the student or business user of statistics to think statistically.  相似文献   

13.
In this paper we develop a study on several types of parallel genetic algorithms (PGAs). Our motivation is to bring some uniformity to the proposal, comparison, and knowledge exchange among the traditionally opposite kinds of serial and parallel GAs. We comparatively analyze the properties of steady-state, generational, and cellular genetic algorithms. Afterwards, this study is extended to consider a distributed model consisting in a ring of GA islands. The analyzed features are the time complexity, selection pressure, schema processing rates, efficacy in finding an optimum, efficiency, speedup, and resistance to scalability. Besides that, we briefly discuss how the migration policy affects the search. Also, some of the search properties of cellular GAs are investigated. The selected benchmark is a representative subset of problems containing real world difficulties. We often conclude that parallel GAs are numerically better and faster than equivalent sequential GAs. Our aim is to shed some light on the advantages and drawbacks of various sequential and parallel GAs to help researchers using them in the very diverse application fields of the evolutionary computation.  相似文献   

14.
This article provides a unified methodology of meta-analysis that synthesizes medical evidence by using both available individual patient data (IPD) and published summary statistics within the framework of likelihood principle. Most up-to-date scientific evidence on medicine is crucial information not only to consumers but also to decision makers, and can only be obtained when existing evidence from the literature and the most recent individual patient data are optimally synthesized. We propose a general linear mixed effects model to conduct meta-analyses when individual patient data are only available for some of the studies and summary statistics have to be used for the rest of the studies. Our approach includes both the traditional meta-analyses in which only summary statistics are available for all studies and the other extreme case in which individual patient data are available for all studies as special examples. We implement the proposed model with statistical procedures from standard computing packages. We provide measures of heterogeneity based on the proposed model. Finally, we demonstrate the proposed methodology through a real life example studying the cerebrospinal fluid biomarkers to identify individuals with high risk of developing Alzheimer's disease when they are still cognitively normal.  相似文献   

15.
Multiple regression diagnostic methods have recently been developed to help data analysts identify failures of data to adhere to the assumptions that customarily accompany regression models. However, the mathematical development of regression diagnostics has not generally led to efficient computing formulas. Conflicting terminology and the use of closely related but subtly different statistics has caused confusion. This article attempts to make regression diagnostics more readily available to those who compute regressions with packaged statistics programs. We review regression diagnostic methodology, highlighting ambiguities of terminology and relationships among similar methods. We present new formulas for efficient computing of regression diagnostics. Finally, we offer specific advice on obtaining regression diagnostics from existing statistics programs, with examples drawn from Minitab and SAS.  相似文献   

16.
Traditionally, the introductory statistics course, Principles of Statistics (STAT 101), at Iowa State University has been taught without reference to a statistical analysis computing package. Although important for the implementation of statistical techniques, a computer component has been perceived by instructors to take time away from the coverage of statistical topics. To gauge students' reactions to the usefulness of a statistical computing package, an experiment was conducted during the fall term of 1986. Volunteers from a STAT 101 class were randomly assigned to either a control group or a computer use group. Both groups filled out questionnaires at the beginning and end of the semester. During the semester, the computer use group had access to and instruction in the use of Minitab. This instruction was tied to homework and laboratory assignments for the course. This article presents results of this experiment. On the basis of the responses to the questionnaires, the value of a statistical computing package as a pedagogical tool is examined. Recommendations for the use of a statistical computing package in a large introductory statistics course are made.  相似文献   

17.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariate binary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.  相似文献   

18.
Statistics as data is ancient, but as a discipline of study and research it has a short history. Courses leading to degrees in statistics have been introduced in universities some sixty to seventy years ago. They were not considered to constitute a basic discipline with a subject matter of its own. However, during the last seventy five years, it has developed as a powerful blend of science, technology and art for solving problems in all areas of human endeavor. Now-a-days statistics is used in scientific research, economic development through optimum use of resources, increasing industrial productivity, medical diagnosis, legal practice, disputed authorship, and optimum decision making at individual and institutional levels. What is the future of statistics in the coming millennium dominated by information technology encompassing the whole of communications, interaction with intelligent systems, massive data bases, and complex information processing networks? The current statistical methodology based on probabilistic models applied on small data sets appears to be inadequate to meet the needs of the society in terms of quick processing of data and making the information available for practical purposes. Adhoc methods are being put forward under the title Data Mining by computer scientists and engineers to meet the needs of customers. The paper reviews the current state of the art in statistics and discusses possible future developments considering the availability of large data sets, enormous computing power and efficient optimization techniques using genetic algorithms and neural networks.

  相似文献   

19.
The first step in statistical analysis is the parameter estimation. In multivariate analysis, one of the parameters of interest to be estimated is the mean vector. In multivariate statistical analysis, it is usually assumed that the data come from a multivariate normal distribution. In this situation, the maximum likelihood estimator (MLE), that is, the sample mean vector, is the best estimator. However, when outliers exist in the data, the use of sample mean vector will result in poor estimation. So, other estimators which are robust to the existence of outliers should be used. The most popular robust multivariate estimator for estimating the mean vector is S-estimator with desirable properties. However, computing this estimator requires the use of a robust estimate of mean vector as a starting point. Usually minimum volume ellipsoid (MVE) is used as a starting point in computing S-estimator. For high-dimensional data computing, the MVE takes too much time. In some cases, this time is so large that the existing computers cannot perform the computation. In addition to the computation time, for high-dimensional data set the MVE method is not precise. In this paper, a robust starting point for S-estimator based on robust clustering is proposed which could be used for estimating the mean vector of the high-dimensional data. The performance of the proposed estimator in the presence of outliers is studied and the results indicate that the proposed estimator performs precisely and much better than some of the existing robust estimators for high-dimensional data.  相似文献   

20.
The growing popular realization that American product quality and productivity are no longer without challenge for world leadership presents an opportunity for the American statistical community to make stronger contributions to sound industrial practice than it has in the past. Management consultants, such as Deming and Juran, are promoting philosophies that contain strong statistical components and are being heard by top U.S. executives. There are thus growing opportunities for industrial statisticians. Upon reviewing the content of typical graduate-level statistical quality control courses and books in the light of the present situation, we find them to be inadequate and in some cases to suffer from inappropriate emphases. In this article we discuss our perceptions of what is needed in the way of a new graduate-level course in statistics for quality and productivity (SQP). We further offer for discussion a syllabus for such a course (which is a modification of one used at Iowa State in the 1983 spring semester), some comments on how specific topics might be approached, and also a partially annotated list of references for material that we believe belongs in a modern SQP course.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号