期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《The American statistician》2013,67(1):89-103

The purpose of this article is to review two text mining packages, namely, WordStat and SAS TextMiner. WordStat is developed by Provalis Research. SAS TextMiner is a product of SAS. We review the features offered by each package on each of the following key steps in analyzing unstructured data: (1) data preparation, including importing and cleaning; (2) performing association analysis; and (3) presenting the findings, including illustrative quotes and graphs. We also evaluate each package on its ability to help researchers extract major themes from a dataset. Both packages offer a variety of features that effectively help researchers run associations and present results. However, in extracting themes from unstructured data, both packages were only marginally helpful. The researcher still needs to read the data and make all the difficult decisions. This finding stems from the fact that the software can search only for specific terms in documents or categorize documents based on common terms. Respondents, however, may use the same term or combination of terms to mean different things. This implies that a text mining approach, which is based on analysis units other than terms, may be more powerful in extracting themes, an idea we touch upon in the conclusion section. 相似文献

2.

Statistical software packages for windows: A market survey

Udo Bankhofer Andreas Hilbert 《Statistical Papers》1997,38(4):393-407

相似文献

3.

Packages for Estimating Finite Mixtures: A Review

Dominique Haughton 《The American statistician》2013,67(2):194-205

We review five packages for estimating finite mixtures, BINOMIX, C.A. MAN, MIX, and the maximum likelihood routines of BMDP and STATA. The focus of the review is on numerical issues rather than matters such as user interface because the success or failure of an algorithm to yield a mixture model is likely to be the most important issue facing a researcher. The problem of suitable initial values is discussed throughout. 相似文献

4.

Query languages for statistical databases

Abdullah Uz Tansel 《Statistics and Computing》1995,5(1):59-72

Statistical database management systems keep raw, elementary and/or aggregated data and include query languages with facilities to calculate various statistics from this data. In this article we examine statistical database query languages with respect to the criteria identified and taxonomy developed in Ozsoyoglu and Ozsoyoglu (1985b). The criteria include statistical metadata and objects, aggregation features and interface to statistical packages. The taxonomy of statistical database query languages classifies them with respect to the data model used, the type of user interface and method of implementation. Temporal databases are rich sources of data for statistical analysis. Aggregation features of temporal query languages, as well as the issues in calculating aggregates from temporal data, are also examined. 相似文献

5.

Automatic model selection for high-dimensional survival analysis

M. Lang H. Kotthaus P. Marwedel C. Weihs J. Rahnenführer B. Bischl 《Journal of Statistical Computation and Simulation》2015,85(1):62-76

Many different models for the analysis of high-dimensional survival data have been developed over the past years. While some of the models and implementations come with an internal parameter tuning automatism, others require the user to accurately adjust defaults, which often feels like a guessing game. Exhaustively trying out all model and parameter combinations will quickly become tedious or infeasible in computationally intensive settings, even if parallelization is employed. Therefore, we propose to use modern algorithm configuration techniques, e.g. iterated F-racing, to efficiently move through the model hypothesis space and to simultaneously configure algorithm classes and their respective hyperparameters. In our application we study four lung cancer microarray data sets. For these we configure a predictor based on five survival analysis algorithms in combination with eight feature selection filters. We parallelize the optimization and all comparison experiments with the BatchJobs and BatchExperiments R packages. 相似文献

6.

A Review of Eight Statistics Software Packages for General Use

Walter T. Morgan 《The American statistician》2013,67(1)

Eight statistical software packages for general use by non-statisticians are reviewed. The packages are GraphPad Prism, InStat, ISP, NCSS, SigmaStat, Statistix, Statmost, and Winks. Summary tables of statistical capabilities and “usability” features are followed by discussions of each package. Discussions include system requirements, data import capabilities, statistical capabilities, and user interface. Recommendations, based on user needs and sophistication, are presented following the reviews. 相似文献

7.

Comparison of different computational implementations on fitting generalized linear mixed-effects models for repeated count measures

《Journal of Statistical Computation and Simulation》2012,82(12):2392-2404

ABSTRACT

In modelling repeated count outcomes, generalized linear mixed-effects models are commonly used to account for within-cluster correlations. However, inconsistent results are frequently generated by various statistical R packages and SAS procedures, especially in case of a moderate or strong within-cluster correlation or overdispersion. We investigated the underlying numerical approaches and statistical theories on which these packages and procedures are built. We then compared the performance of these statistical packages and procedures by simulating both Poisson-distributed and overdispersed count data. The SAS NLMIXED procedure outperformed the others procedures in all settings. 相似文献

8.

An Examination of Five Statistical Software Packages for Epidemiology

Robert A. Oster 《The American statistician》2013,67(3):267-280

Five statistical software packages for epidemiology and clinical trials are reviewed. The five packages are EPI INFO, EPICURE, EPILOG PLUS, STATA, and TRUE EPI-STAT. Only DOS versions of these packages are compared and rated (Windows versions are discussed but not rated). Although the packages differ in their target audiences, interfaces, capabilities, and approaches, they are examined according to criteria that are of most interest to epidemiologists, biostatisticians, and others involved in epidemiologic and clinical research. A general discussion with recommendations follows the review of the statistical packages. 相似文献

9.

Nonparametric multiple expectile regression via ER-Boost

Yi Yang 《Journal of Statistical Computation and Simulation》2015,85(7):1442-1458

Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple expectiles at different levels provides a more complete picture of the conditional distribution of the response variable. Multiple linear expectile regression model has been well studied [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847; Efron B. Regression percentiles using asymmetric squared error loss, Stat Sin. 1991;1(93):125.], but it can be too restrictive for many real applications. In this paper, we derive a regression tree-based gradient boosting estimator for nonparametric multiple expectile regression. The new estimator, referred to as ER-Boost, is implemented in an R package erboost publicly available at http://cran.r-project.org/web/packages/erboost/index.html. We use two homoscedastic/heteroscedastic random-function-generator models in simulation to show the high predictive accuracy of ER-Boost. As an application, we apply ER-Boost to analyse North Carolina County crime data. From the nonparametric expectile regression analysis of this dataset, we draw several interesting conclusions that are consistent with the previous study using the economic model of crime. This real data example also provides a good demonstration of some nice features of ER-Boost, such as its ability to handle different types of covariates and its model interpretation tools. 相似文献

10.

Power analysis for clustered non-continuous responses in multicenter trials

T. Chen K. Knox J. Arora W. Tang J. Kowalski X.M. Tu 《Journal of applied statistics》2016,43(6):979-995

Power analysis for multi-center randomized control trials is quite difficult to perform for non-continuous responses when site differences are modeled by random effects using the generalized linear mixed-effects model (GLMM). First, it is not possible to construct power functions analytically, because of the extreme complexity of the sampling distribution of parameter estimates. Second, Monte Carlo (MC) simulation, a popular option for estimating power for complex models, does not work within the current context because of a lack of methods and software packages that would provide reliable estimates for fitting such GLMMs. For example, even statistical packages from software giants like SAS do not provide reliable estimates at the time of writing. Another major limitation of MC simulation is the lengthy running time, especially for complex models such as GLMM, especially when estimating power for multiple scenarios of interest. We present a new approach to address such limitations. The proposed approach defines a marginal model to approximate the GLMM and estimates power without relying on MC simulation. The approach is illustrated with both real and simulated data, with the simulation study demonstrating good performance of the method. 相似文献

11.

Statistical Computing Packages: Some Words of Caution

Shayle R. Searle 《The American statistician》2013,67(4):189-190

Three situations are cited when caution is needed in using statistical computing packages: (a) when analyzing data and having insufficient statistical knowledge to completely understand the output; (b) when teaching the use of packages in a statistics course, to the exclusion of teaching statistics; and (c) when using packages in subject-matter teaching, without teaching the statistical methods underlying the packages. 相似文献

12.

Kalman filter-based modelling and forecasting of stochastic volatility with threshold

Himadri Ghosh Bishal Gurung Prajneshu 《Journal of applied statistics》2015,42(3):492-507

We propose a parametric nonlinear time-series model, namely the Autoregressive-Stochastic volatility with threshold (AR-SVT) model with mean equation for forecasting level and volatility. Methodology for estimation of parameters of this model is developed by first obtaining recursive Kalman filter time-update equation and then employing the unrestricted quasi-maximum likelihood method. Furthermore, optimal one-step and two-step-ahead out-of-sample forecasts formulae along with forecast error variances are derived analytically by recursive use of conditional expectation and variance. As an illustration, volatile all-India monthly spices export during the period January 2006 to January 2012 is considered. Entire data analysis is carried out using EViews and matrix laboratory (MATLAB) software packages. The AR-SVT model is fitted and interval forecasts for 10 hold-out data points are obtained. Superiority of this model for describing and forecasting over other competing models for volatility, namely AR-Generalized autoregressive conditional heteroscedastic, AR-Exponential GARCH, AR-Threshold GARCH, and AR-Stochastic volatility models is shown for the data under consideration. Finally, for the AR-SVT model, optimal out-of-sample forecasts along with forecasts of one-step-ahead variances are obtained. 相似文献

13.

Seeing the Forest for Trees: Tools for Analyzing Faculty Research Output

Katharine Frazier Hilary Davis John Vickery 《Serials Review》2020,46(3):184-189

Abstract

For academic libraries, because budgetary pressures are nearly universal, it is imperative to evaluate journal packages regularly. This article presents an overview of the data and methods that the NC State University Libraries traditionally uses to evaluate journal packages and presents additional methods to expand our evaluation of publishing and editorial activity. We describe methods for downloading and analyzing Web of Science citation data to identify the most common publishers for NC State affiliated authors as well as the journals in which NC State authors publish most frequently. This article also demonstrates a custom Python web scraping application to harvest NC State affiliated editor data from publishers’ websites. Finally, this article discusses how these data elements are combined to provide a more comprehensive evaluative strategy for our journal investments. 相似文献

14.

Statistical software packages for PCs — A market survey

T. Bausch U. Bankhofer 《Statistical Papers》1992,33(1):283-306

The increase of statistical software applications for PCs is caused by decreasing hardware costs and dramatically enhanced PC performance. Whereas in the past the domain of statistical computing has been reserved to mainframe solutions, a great number of new software packages for PCs have come out in the last five years. Therefore, the producers of established mainframe software were also forced to offer PC-based solutions. By limiting a market analysis to products with a medium set of well known statistical methods, the immense number of available products is reduced to about fifty systems. We ordered an evaluation copy of these systems to test the numerical quality, the system speed, and the performance of several procedures. Seventeen packages were made available for an extensive examination. This paper will (1) discuss the problems and the solutions of obtaining a complete and correct datamatrix that describes the entire market and (2) present the results of a comparative market analysis. 相似文献

15.

Efficient ways to impute incomplete panel data

Kristian Kleinke Mark Stemmler Jost Reinecke Friedrich Lösel 《AStA Advances in Statistical Analysis》2011,95(4):351-373

We find that existing multiple imputation procedures that are currently implemented in major statistical packages and that are available to the wide majority of data analysts are limited with regard to handling incomplete panel data. We review various missing data methods that we deem useful for the analysis of incomplete panel data and discuss, how some of the shortcomings of existing procedures can be overcome. In a simulation study based on real panel data, we illustrate these procedures’ quality and outline fruitful avenues of future research. 相似文献

16.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

17.

Multiple imputation in practice—a case study using a complex German establishment survey

Jörg Drechsler 《AStA Advances in Statistical Analysis》2011,95(1):1-26

Multiple imputation is widely accepted as the method of choice to address item nonresponse in surveys. Nowadays most statistical software packages include features to multiply impute missing values in a dataset. Nevertheless, the application to real data imposes many implementation problems. To define useful imputation models for a dataset that consists of categorical and possibly skewed continuous variables, contains skip patterns and all sorts of logical constraints is a challenging task. Besides, in most applications little attention is paid to the evaluation of the underlying assumptions behind the imputation models. 相似文献

18.

A Review of Three Directed Acyclic Graphs Software Packages

《The American statistician》2013,67(3):272-286

This article offers a review of three software packages that estimate directed acyclic graphs (DAGs) from data. The three packages, MIM, Tetrad and WinMine, can help researchers discover underlying causal structure. Although each package uses a different algorithm, the results are to some extent similar. All three packages are free and easy to use. They are likely to be of interest to researchers who do not have strong theory regarding the causal structure in their data. DAG modeling is a powerful analytic tool to consider in conjunction with, or in place of, path analysis, structural equation modeling, and other statistical techniques. 相似文献

19.

Predictive capability of stratified proportional hazards models

Loki Natarajan John O'Quigley 《Journal of applied statistics》2002,29(8):1153-1163

Following on from the work of O'Quigley & Flandre (1994) and, more recently, O'Quigley & Xu (2000), we develop a measure, R2, of the predictive ability of a stratified proportional hazards regression model. The extension of this earlier work to the stratified case is relatively straightforward, both conceptually and in its practical implementation. The extension is nonetheless important in that the stratified model is making weaker assumptions than the full multivariate model. Formulae are given that can be readily incorporated into standard software routines, since the component parts of the calculations are routinely provided by most packages. We give examples on the predictability of survival in breast cancer data, modelled via proportional hazards and stratified proportional hazards models, the latter being necessary in view of the effects of a non-proportional hazards nature. 相似文献

20.

Comment: Uses of Administrative Records: A Social Security Point of View

John J. Carroll 《商业与经济统计学杂志》2013,31(4):396-397

Recent evidence indicates that using multiple forward rates sharply predicts future excess returns on U.S. Treasury Bonds, with the R²'s being around 30%. The projection coefficients in these regressions exhibit a distinct pattern that relates to the maturity of the forward rate. These dimensions of the data, in conjunction with the transition dynamics of bond yields, offer a serious challenge to term structure models. In this article we show that a regime-shifting term structure model can empirically account for these challenging data features. Alternative models, such as affine specification, fail to account for these important features. We find that regimes in the model are intimately related to bond risk premia and real business cycles. 相似文献