期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using large data sets to forecast sectoral employment

Rangan Gupta Alain Kabundi Stephen M. Miller Josine Uwilingiye 《Statistical Methods and Applications》2014,23(2):229-264

We use several models using classical and Bayesian methods to forecast employment for eight sectors of the US economy. In addition to using standard vector-autoregressive and Bayesian vector autoregressive models, we also augment these models to include the information content of 143 additional monthly series in some models. Several approaches exist for incorporating information from a large number of series. We consider two multivariate approaches—extracting common factors (principal components) and Bayesian shrinkage. After extracting the common factors, we use Bayesian factor-augmented vector autoregressive and vector error-correction models, as well as Bayesian shrinkage in a large-scale Bayesian vector autoregressive models. For an in-sample period of January 1972 to December 1989 and an out-of-sample period of January 1990 to March 2010, we compare the forecast performance of the alternative models. More specifically, we perform ex-post and ex-ante out-of-sample forecasts from January 1990 through March 2009 and from April 2009 through March 2010, respectively. We find that factor augmented models, especially error-correction versions, generally prove the best in out-of-sample forecast performance, implying that in addition to macroeconomic variables, incorporating long-run relationships along with short-run dynamics play an important role in forecasting employment. Forecast combination models, however, based on the simple average forecasts of the various models used, outperform the best performing individual models for six of the eight sectoral employment series. 相似文献

2.

LASSO and shrinkage estimation in Weibull censored regression models

S. Ejaz Ahmed Shakhawat Hossain Kjell A. Doksum 《Journal of statistical planning and inference》2012

In this paper we address the problem of estimating a vector of regression parameters in the Weibull censored regression model. Our main objective is to provide natural adaptive estimators that significantly improve upon the classical procedures in the situation where some of the predictors may or may not be associated with the response. In the context of two competing Weibull censored regression models (full model and candidate submodel), we consider an adaptive shrinkage estimation strategy that shrinks the full model maximum likelihood estimate in the direction of the submodel maximum likelihood estimate. We develop the properties of these estimators using the notion of asymptotic distributional risk. The shrinkage estimators are shown to have higher efficiency than the classical estimators for a wide class of models. Further, we consider a LASSO type estimation strategy and compare the relative performance with the shrinkage estimators. Monte Carlo simulations reveal that when the true model is close to the candidate submodel, the shrinkage strategy performs better than the LASSO strategy when, and only when, there are many inactive predictors in the model. Shrinkage and LASSO strategies are applied to a real data set from Veteran's administration (VA) lung cancer study to illustrate the usefulness of the procedures in practice. 相似文献

3.

Forecasting Performance of an Open Economy DSGE Model

Malin Adolfson Jesper Lindé 《Econometric Reviews》2013,32(2-4):289-328

This paper analyzes the forecasting performance of an open economy dynamic stochastic general equilibrium (DSGE) model, estimated with Bayesian methods, for the Euro area during 1994Q1–2002Q4. We compare the DSGE model and a few variants of this model to various reduced-form forecasting models such as vector autoregressions (VARs) and vector error correction models (VECM), estimated both by maximum likelihood and two different Bayesian approaches, and traditional benchmark models, e.g., the random walk. The accuracy of point forecasts, interval forecasts and the predictive distribution as a whole are assessed in an out-of-sample rolling event evaluation using several univariate and multivariate measures. The results show that the open economy DSGE model compares well with more empirical models and thus that the tension between rigor and fit in older generations of DSGE models is no longer present. We also critically examine the role of Bayesian model probabilities and other frequently used low-dimensional summaries, e.g., the log determinant statistic, as measures of overall forecasting performance. 相似文献

4.

Forecasting Performance of an Open Economy DSGE Model 总被引：1，自引：0，他引：1

《Econometric Reviews》2007,26(2):289-328

This paper analyzes the forecasting performance of an open economy dynamic stochastic general equilibrium (DSGE) model, estimated with Bayesian methods, for the Euro area during 1994Q1-2002Q4. We compare the DSGE model and a few variants of this model to various reduced-form forecasting models such as vector autoregressions (VARs) and vector error correction models (VECM), estimated both by maximum likelihood and two different Bayesian approaches, and traditional benchmark models, e.g., the random walk. The accuracy of point forecasts, interval forecasts and the predictive distribution as a whole are assessed in an out-of-sample rolling event evaluation using several univariate and multivariate measures. The results show that the open economy DSGE model compares well with more empirical models and thus that the tension between rigor and fit in older generations of DSGE models is no longer present. We also critically examine the role of Bayesian model probabilities and other frequently used low-dimensional summaries, e.g., the log determinant statistic, as measures of overall forecasting performance. 相似文献

5.

The assessment of tumour response to treatment

Eugene Demidenko 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(3):365-377

Summary. A parsimonious model for treated tumours is developed as a continuation of our previous work on regrowth curve theory. The statistical model belongs to the family of marginal non-linear models since the only linear parameters of the model are tumour specific and random facilitating parameter estimation. An important feature of the model is that it enables the estimation of the fraction of cancer cells surviving the treatment in vivo having easy-to-obtain longitudinal measurements of tumour volume. We compare several methods of estimation, including Lindstrom–Bates, iterated reweighted least squares and maximum likelihood. The last two methods are computed via the total estimating equations approach and variance least squares. The theory is illustrated with a photodynamic tumour therapy example. 相似文献

6.

A comparison of classification models to identify the Fragile X Syndrome

Rafael Pino-Mejías Mercedes Carrasco-Mairena Antonio Pascual-Acosta María-Dolores Cubiles-De-La-Vega Joaquín Muñoz-García 《Journal of applied statistics》2008,35(3):233-244

The main models of machine learning are briefly reviewed and considered for building a classifier to identify the Fragile X Syndrome (FXS). We have analyzed 172 patients potentially affected by FXS in Andalusia (Spain) and, by means of a DNA test, each member of the data set is known to belong to one of two classes: affected, not affected. The whole predictor set, formed by 40 variables, and a reduced set with only nine predictors significantly associated with the response are considered. Four alternative base classification models have been investigated: logistic regression, classification trees, multilayer perceptron and support vector machines. For both predictor sets, the best accuracy, considering both the mean and the standard deviation of the test error rate, is achieved by the support vector machines, confirming the increasing importance of this learning algorithm. Three ensemble methods - bagging, random forests and boosting - were also considered, amongst which the bagged versions of support vector machines stand out, especially when they are constructed with the reduced set of predictor variables. The analysis of the sensitivity, the specificity and the area under the ROC curve agrees with the main conclusions extracted from the accuracy results. All of these models can be fitted by free R programs. 相似文献

7.

Bayesian neural tree models for nonparametric regression

Tanujit Chakraborty Gauri Kamat Ashis Kumar Chakraborty 《Australian & New Zealand Journal of Statistics》2023,65(2):101-126

Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models. 相似文献

8.

SVM-like decision theoretical classification of high-dimensional vectors

David J. Bradshaw Marianna Pensky 《Journal of statistical planning and inference》2010

In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the SVM classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate. 相似文献

9.

Regularization through variable selection and conditional MLE with application to classification in high dimensions

Eitan Greenshtein Junyong Park Guy Lebanon 《Journal of statistical planning and inference》2009

It is often the case that high-dimensional data consist of only a few informative components. Standard statistical modeling and estimation in such a situation is prone to inaccuracies due to overfitting, unless regularization methods are practiced. In the context of classification, we propose a class of regularization methods through shrinkage estimators. The shrinkage is based on variable selection coupled with conditional maximum likelihood. Using Stein's unbiased estimator of the risk, we derive an estimator for the optimal shrinkage method within a certain class. A comparison of the optimal shrinkage methods in a classification context, with the optimal shrinkage method when estimating a mean vector under a squared loss, is given. The latter problem is extensively studied, but it seems that the results of those studies are not completely relevant for classification. We demonstrate and examine our method on simulated data and compare it to feature annealed independence rule and Fisher's rule. 相似文献

10.

Phi-Divergence Statistics for Testing Linear Hypotheses in Logistic Regression Models

Maria Luisa Menéndez Julio Angel Pardo 《统计学通讯:理论与方法》2013,42(4):494-507

In this paper we introduce and study two new families of statistics for the problem of testing linear combinations of the parameters in logistic regression models. These families are based on the phi-divergence measures. One of them includes the classical likelihood ratio statistic and the other the classical Pearson's statistic for this problem. It is interesting to note that the vector of unknown parameters, in the two new families of phi-divergence statistics considered in this paper, is estimated using the minimum phi-divergence estimator instead of the maximum likelihood estimator. Minimum phi-divergence estimators are a natural extension of the maximum likelihood estimator. 相似文献

11.

Bayesian Additive Machine: classification with a semiparametric discriminant function

《Journal of Statistical Computation and Simulation》2012,82(4):682-695

In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality. 相似文献

12.

信用评分模型应用比较研究——基于个体工商户数据的检验 总被引：1，自引：0，他引：1

晏艳阳蒋恒波《统计与信息论坛》2010,25(5):30-35

信用评分是各类机构进行信用管理的有效工具,有着广泛的应用前景。随着计量技术的发展,信用评分方法也不断革新,为实际应用提供了多种选择。选取Logistic回归、分类树两种统计方法及代表信用评分发展趋势的人工智能神经网络中的多层感知器、径向基网络、自组织特征映射网络、支持向量机等共六种模型,运用较大样本量的个体工商户数据在一致的框架下进行检验。结果表明：Logistic回归模型与支持向量机两种方法在错分率、稳定性及适用性方面较为优越,其中支持向量机作为人工智能评分方法的最新应用之一,其综合性能更为突出。相似文献

13.

Assessing Poisson variation of intestinal tumour multiplicity in mice carrying a Robertsonian translocation

Michael A. Newton David I. Hastie 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(1):123-138

Summary. Tumour multiplicity is a frequently measured phenotype in animal studies of cancer biology. Poisson variation of this measurement represents a biological and statistical reference point that is usually violated, even in highly controlled experiments, owing to sources of variation in the stochastic process of tumour formation. A recent experiment on murine intestinal tumours presented conditions which seem to generate Poisson-distributed tumour counts. If valid, this would support a claim about mechanisms by which the adenomatous polyposis coli gene is inactivated during tumour initiation. In considering hypothesis testing strategies, model choice and Bayesian approaches, we quantify the positive evidence favouring Poisson variation in this experiment. Statistical techniques used include likelihood ratio testing, the Bayes and Akaike information criteria, negative binomial modelling, reversible jump Markov chain Monte Carlo methods and posterior predictive checking. The posterior approximation that is based on the Bayes information criterion is found to be quite accurate in this small n case-study. 相似文献

14.

A classifier for multi-dimensional datasets based on Bayesian multiple kernel grouping learning

Fangli Dong 《Journal of Statistical Computation and Simulation》2019,89(11):2151-2174

This paper proposes an algorithm for the classification of multi-dimensional datasets based on the conjugate Bayesian Multiple Kernel Grouping Learning (BMKGL). Using conjugate Bayesian framework improves the computation efficiency. Multiple kernels instead of a single kernel avoid the kernel selection problem which is also a computationally expensive work. Through grouping parameter learning, BMKGL can simultaneously integrate information from different dimensions and find the dimensions which contribute more to the variations of the outcome for the purpose of interpretable property. Meanwhile, BMKGL can select the most suitable combination of kernels for different dimensions so as to extract the most appropriate measure for each dimension and improve the accuracy of classification results. The simulation results illustrate that our learning process has better performance in prediction results and stability compared to some popular classifiers, such as k-nearest neighbours algorithm, support vector machine algorithm and naive Bayes classifier. BMKGL also outperforms previous methods in terms of accuracy and interpretation for the heart disease and EEG datasets. 相似文献

15.

Hierarchical priors for Bayesian CART shrinkage

Chipman Hugh McCulloch Robert E. 《Statistics and Computing》2000,10(1):17-24

The Bayesian CART (classification and regression tree) approach proposed by Chipman, George and McCulloch (1998) entails putting a prior distribution on the set of all CART models and then using stochastic search to select a model. The main thrust of this paper is to propose a new class of hierarchical priors which enhance the potential of this Bayesian approach. These priors indicate a preference for smooth local mean structure, resulting in tree models which shrink predictions from adjacent terminal node towards each other. Past methods for tree shrinkage have searched for trees without shrinking, and applied shrinkage to the identified tree only after the search. By using hierarchical priors in the stochastic search, the proposed method searches for shrunk trees that fit well and improves the tree through shrinkage of predictions. 相似文献

16.

Bayesian incidence analysis of animal tumorigenicity data

D. B. Dunson & G. E. Dinse 《Journal of the Royal Statistical Society. Series C, Applied statistics》2001,50(2):125-141

Statistical inference about tumorigenesis should focus on the tumour incidence rate. Unfortunately, in most animal carcinogenicity experiments, tumours are not observable in live animals and censoring of the tumour onset times is informative. In this paper, we propose a Bayesian method for analysing data from such studies. Our approach focuses on the incidence of tumours and accommodates occult tumours and censored onset times without restricting tumour lethality, relying on cause-of-death data, or requiring interim sacrifices. We represent the underlying state of nature by a multistate stochastic process and assume general probit models for the time-specific transition rates. These models allow the incorporation of covariates, historical control data and subjective prior information. The inherent flexibility of this approach facilitates the interpretation of results, particularly when the sample size is small or the data are sparse. We use a Gibbs sampler to estimate the relevant posterior distributions. The methods proposed are applied to data from a US National Toxicology Program carcinogenicity study. 相似文献

17.

On the estimation problem of periodic autoregressive time series: symmetric and asymmetric innovations

T. Manouchehri 《Journal of Statistical Computation and Simulation》2019,89(1):71-97

Periodic autoregressive (PAR) models with symmetric innovations are widely used on time series analysis, whereas its asymmetric counterpart inference remains a challenge, because of a number of problems related to the existing computational methods. In this paper, we use an interesting relationship between periodic autoregressive and vector autoregressive (VAR) models to study maximum likelihood and Bayesian approaches to the inference of a PAR model with normal and skew-normal innovations, where different kinds of estimation methods for the unknown parameters are examined. Several technical difficulties which are usually complicated to handle are reported. Results are compared with the existing classical solutions and the practical implementations of the proposed algorithms are illustrated via comprehensive simulation studies. The methods developed in the study are applied and illustrate a real-time series. The Bayes factor is also used to compare the multivariate normal model versus the multivariate skew-normal model. 相似文献

18.

Bayesian Analysis of Nonlinear Reproductive Dispersion Mixed Models for Longitudinal Data with Nonignorable Missing Covariates

Nian-Sheng Tang Hui Zhao 《统计学通讯:模拟与计算》2013,42(6):1265-1287

This article proposes a Bayesian approach, which can simultaneously obtain the Bayesian estimates of unknown parameters and random effects, to analyze nonlinear reproductive dispersion mixed models (NRDMMs) for longitudinal data with nonignorable missing covariates and responses. The logistic regression model is employed to model the missing data mechanisms for missing covariates and responses. A hybrid sampling procedure combining the Gibber sampler and the Metropolis-Hastings algorithm is presented to draw observations from the conditional distributions. Because missing data mechanism is not testable, we develop the logarithm of the pseudo-marginal likelihood, deviance information criterion, the Bayes factor, and the pseudo-Bayes factor to compare several competing missing data mechanism models in the current considered NRDMMs with nonignorable missing covaraites and responses. Three simulation studies and a real example taken from the paediatric AIDS clinical trial group ACTG are used to illustrate the proposed methodologies. Empirical results show that our proposed methods are effective in selecting missing data mechanism models. 相似文献

19.

A Bayesian model for cross-study differential gene expression

Scharpf RB Tjelmeland H Parmigiani G Nobel AB 《Journal of the American Statistical Association》2009,104(488):1295-1310

In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies, and flexible modeling that allows for interactions between platforms and the estimated effect, as well as concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-study" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis, but also under a realistic alternative. The simulation results from the artificial data demonstrate the advantages of the Bayesian model. The 1 - AUC values for the Bayesian model are roughly half of the corresponding values for a direct combination of t- and SAM-statistics. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most noticeably, in small studies the Bayesian model generally outperforms other methods when evaluated by AUC, FDR, and MDR across a range of simulation parameters, and this difference diminishes for larger sample sizes in the individual studies. The split-study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform-, sample-, and annotation-differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies employing different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor positive tumors versus negative ones. Software and data for reproducing our analysis are publicly available. 相似文献

20.

A directional look at F‐tests

Andrew Mccormack Nancy Reid Nicola Sartori Sri‐Amirthan Theivendran 《Revue canadienne de statistique》2019,47(4):619-627

Directional testing of vector parameters, based on higher order approximations of likelihood theory, can ensure extremely accurate inference, even in high‐dimensional settings where standard first order likelihood results can perform poorly. Here we explore examples of directional inference where the calculations can be simplified, and prove that in several classical situations, the directional test reproduces exact results based on F‐tests. These findings give a new interpretation of some classical results and support the use of directional testing in general models, where exact solutions are typically not available. The Canadian Journal of Statistics 47: 619–627; 2019 © 2019 Statistical Society of Canada 相似文献