首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Python is a powerful high-level open source programming language that is available for multiple platforms. It supports object-oriented programming and has recently become a serious alternative to low-level compiled languages such as C + +. It is easy to learn and use, and is recognized for very fast development times, which makes it suitable for rapid software prototyping as well as teaching purposes. We motivate the use of Python and its free extension modules for high performance stand-alone applications in econometrics and statistics, and as a tool for gluing different applications together. (It is in this sense that Python forms a “unified” environment for statistical research.) We give details on the core language features, which will enable a user to immediately begin work, and then provide practical examples of advanced uses of Python. Finally, we compare the run-time performance of extended Python against a number of commonly-used statistical packages and programming environments.

Supplemental materials are available for this article. Go to the publisher's online edition of Econometric Reviews to view the free supplemental file.  相似文献   

3.
Conventional analyses of a composite of multiple time-to-event outcomes use the time to the first event. However, the first event may not be the most important outcome. To address this limitation, generalized pairwise comparisons and win statistics (win ratio, win odds, and net benefit) have become popular and have been applied to clinical trial practice. However, win ratio, win odds, and net benefit have typically been used separately. In this article, we examine the use of these three win statistics jointly for time-to-event outcomes. First, we explain the relation of point estimates and variances among the three win statistics, and the relation between the net benefit and the Mann–Whitney U statistic. Then we explain that the three win statistics are based on the same win proportions, and they test the same null hypothesis of equal win probabilities in two groups. We show theoretically that the Z-values of the corresponding statistical tests are approximately equal; therefore, the three win statistics provide very similar p-values and statistical powers. Finally, using simulation studies and data from a clinical trial, we demonstrate that, when there is no (or little) censoring, the three win statistics can complement one another to show the strength of the treatment effect. However, when the amount of censoring is not small, and without adjustment for censoring, the win odds and the net benefit may have an advantage for interpreting the treatment effect; with adjustment (e.g., IPCW adjustment) for censoring, the three win statistics can complement one another to show the strength of the treatment effect. For calculations we use the R package WINS, available on the CRAN (Comprehensive R Archive Network).  相似文献   

4.
Like most professional disciplines, the ASA has adopted ethical guidelines for its practitioners. To promote these guidelines, as well as to meet governmental and institutional mandates, U.S. universities are demanding more training on ethics within existing statistics graduate student curricula. Most of this training is based on the teachings of Western philosophers. However, many statistics graduate students are from Eastern cultures (particularly Chinese), and cultural and linguistic evidence indicates that Western ethics may be difficult to translate into the philosophical concepts common to students from different cultural backgrounds. This article describes how to teach cross-cultural ethics, with emphasis on the ASA Ethical Guidelines, within a graduate-level statistical consulting course. In particular, we present content that can help students overcome cultural and language barriers to gain an understanding of ethical decision-making that is compatible with both Western and Eastern philosophical models. Supplementary materials for this article are available online.  相似文献   

5.
6.
R is a multi-paradigm language with a dynamic type system, different object systems and functional characteristics. These characteristics support the development of statistical algorithms at a high level of abstraction. Although R is commonly used in the statistics domain a big disadvantage are its runtime problems when handling computation-intensive algorithms. Especially in the domain of machine learning the execution of pure R programs is often unacceptably slow. Our long-term goal is to resolve these issues and in this contribution we used the traceR tool to analyse the bottlenecks arising in this domain. Here we measured the runtime and overall memory consumption on a well-defined set of classical machine learning applications and gained detailed insights into the performance issues of these programs.  相似文献   

7.
The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.  相似文献   

8.
The six recommendations made by the Guidelines for Assessment and Instruction in Statistics Education (GAISE) committee were first communicated in 2005 and more formally in 2010. In this article, 25 introductory statistics textbooks are examined to assess how well these textbooks have incorporated the three GAISE recommendations most relevant to implementation in textbooks (statistical literacy and thinking; use of real data; stress concepts over procedures). The implementation of another recommendation (using technology) is described but not assessed. In general, most textbooks appear to be adopting the GAISE recommendations reasonably well in both exposition and exercises. The textbooks are particularly adept at using real data, using real data well, and promoting statistical literacy. Textbooks are less adept—but still rated reasonably well, in general—at explaining concepts over procedures and promoting statistical thinking. In contrast, few textbooks have easy-usable glossaries of statistical terms to assist with understanding of statistical language and literacy development. Supplementary materials for this article are available online.  相似文献   

9.
Beanplot is a graphical method for visualizing univariate distributions. Density forecasts have an important role to play in many applications. Although graphical methods are widely used for illustrating distributions, suitable graphical methods to help for the purposes of analysis and comparison of density forecasters do not exist. This article explains how density forecasts and related observed densities are visualized parallel using beanplots in different groups of data. The visualization method is illustrated with industrial and simulated data. The functionality extends the plotting function of R package beanplot and the developed functions are made available for R programming language.  相似文献   

10.
A scan statistic is proposed for the prospective monitoring of spatiotemporal count data with an excess of zeros. The method that is based on an outbreak model for the zero‐inflated Poisson distribution is shown to be superior to traditional scan statistics based on the Poisson distribution in the presence of structural zeros. The spatial accuracy and the detection timeliness of the proposed scan statistic are investigated by means of simulation, and an application on the weekly cases of Campylobacteriosis in Germany illustrates how the scan statistic could be used to detect emerging disease outbreaks. An implementation of the method is provided in the open‐source R package scanstatistics available on the Comprehensive R Archive Network.  相似文献   

11.
This article introduces BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule use the mathematical programming routines of the SAS/OR software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in Best-Class with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors.  相似文献   

12.
This article proposes the use of optimization techniques and tools to maximize the likelihood if maximization cannot be easily accomplished with standard statistical software. In such situations, the use of the programming language AMPL with the freely available optimization solvers under the NEOS Server is an attractive alternative to algorithms developed for specific optimization problems in statistics. This article is meant to be a short tutorial introducing statisticians to these methods and tools. We provide an example to illustrate these methods. The necessary files for maximization are included in the Appendix so that the reader can carry out the optimization procedure described.  相似文献   

13.
“One method of error analysis (not the one we will use) is based upon the principles of mathematical statistics. Unfortunately, statistical methods can only be meaningfully applied when one has large amounts of data for a given system. In many cases … these large quantities of data are not available … then statistical methods are not applicable, and some other methods must be devised.”  相似文献   

14.
Response surface experimentation is an integral part of the development of a new process or product, but the relatively efficient statistical methodologies for such experimentation are underutilized by research and development scientists and engineers because of a lack of knowledge and/or understanding of these methodologies. To help to increase its utilization, a simplified approach to one such statistical methodology, known as the determination of optimum conditions, has been developed which can be used by scientists and engineers with a minimum of statistical knowledge.  相似文献   

15.
On runs of length exceeding a threshold: normal approximation   总被引:1,自引:0,他引:1  
Run statistics denoting number of runs and sum of run lengths are defined on binary sequences and their asymptotic normality is established by a simple unified way for Bernoulli sequences. All the considered statistics share a common feature; they refer to runs of length exceeding a specific length (a threshold). Asymptotic results of associated statistics denoting run lengths and waiting times are derived as well. Specific probabilities of the examined statistics are used in applications in the fields of system reliability and molecular biology. The study is illustrated by an extensive numerical experimentation.  相似文献   

16.
Data from complex surveys are being used increasingly to build the same sort of explanatory and predictive models as those used in the rest of statistics. Unfortunately the assumptions underlying standard statistical methods are not even approximately valid for most survey data. The problem of parameter estimation has been largely solved, at least for routine data analysis, through the use of weighted estimating equations, and software for most standard analytical procedures is now available in the major statistical packages. One notable omission from standard software is an analogue of the likelihood ratio test. An exception is the Rao–Scott test for loglinear models in contingency tables. In this paper we show how the Rao–Scott test can be extended to handle arbitrary regression models. We illustrate the process of fitting a model to survey data with an example from NHANES.  相似文献   

17.
Scientists in every discipline are generating data more rapidly than ever before, resulting in an increasing need for statistical skills at a time when there is decreasing visibility for the field of statistics. Resolving this paradox requires stronger statistical leadership to guide multidisciplinary teams in the design and planning of scientific research and making decisions based on data. It requires more effective communication to nonstatisticians of the value of statistics in using data to answer questions, predict outcomes, and support decision-making in the face of uncertainty. It also requires a greater appreciation of the unique capabilities of alternative quantitative disciplines such as machine learning, data science, pharmacometrics, and bioinformatics which represent an opportunity for statisticians to achieve greater impact through collaborative partnership. Examples taken from pharmaceutical drug development are used to illustrate the concept of statistical leadership in a collaborative multidisciplinary team environment.  相似文献   

18.
袁卫  李惠 《统计研究》2021,38(7):153-160
民国时期统计留学生主要求学地分布在英美等国,其中既有专门研究数理统计方法的许宝騄等人,也有利用统计学方法研究生物学、经济学、教育学、社会学、心理学等问题的朱君毅、陈达、吴定良等人。这批统计留学生求学于当时的世界统计中心或顶尖大学,其中不乏令世界统计学界为之瞩目的杰出人才,以相对较小的规模取得了比肩国际一流水平的学术成就,为世界统计学发展贡献了中国智慧。他们的教育背景和学术水平,奠定了我国近代统计教育的较高起点和坚实基础,值得学术界深入挖掘和研究。其学术报国的家国情怀,至今仍是激励后辈学人的精神力量。  相似文献   

19.
《统计研究》作为我国统计学界的权威学术刊物,自创刊之日起,一直不遗余力地促进着统计学科的发展,为统计科学的研究和交流提供了良好的平台,记录了经济统计和数理统计研究的发展轨迹,记载了统计学科成长、发展和壮大的历史,造就了一大批统计学术精英。30年来,《统计研究》共刊载文章4640篇,涉及统计学研究的各个领域,加快了与其他学科的融合,参与成果发表的作者单位遍布国内外,提升了《统计研究》的国际影响力;《统计研究》各项期刊科学指标发展态势良好,其质量和影响力在统计类刊物中一路领先,在经济类权威刊物中,也在起步较晚的情况下稳步发展,取得了长足的进步,在权威刊物中占据一席之地。《统计研究》应乘势而上,借助统计学成为一级学科之势,将其建成国家一级期刊。  相似文献   

20.
洪永淼 《统计研究》2016,33(5):3-12
本文从统计学和经济学统一的视角,分析与阐述经济统计学与计量经济学等相关学科——概率论、数理统计学、计量经济学以及经济理论(包括数理经济学)之间的相互关系及发展前景。作为从样本信息推断母体特征的一般方法论,数理统计学由于符合人类科学研究的过程与需要,因而在自然科学和社会科学的很多领域得到了广泛应用。计量经济学是经济实证研究的推断方法论。经济统计学与计量经济学一起,构成经济实证研究完整的方法论,其中,作为经济测度方法论,经济统计学不仅提供定量描述经济实际运行的理论、方法与工具,它也是经济实证研究的先决条件与基础,是计量经济学理论发展的一个重要推动力量。经济统计学面临不少挑战,但有深厚的学科根基与巨大的发展空间,其作用是任何相关学科均不能替代的。统计学各个分支的交叉融合,将推动经济统计学和计量经济学的共同发展,从而进一步提升中国经济学实证研究的水平与科学性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号