期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Inferences from Cross-Sectional,Stochastic Frontier Models

Léopold Simar 《Econometric Reviews》2013,32(1):62-98

Conventional approaches for inference about efficiency in parametric stochastic frontier (PSF) models are based on percentiles of the estimated distribution of the one-sided error term, conditional on the composite error. When used as prediction intervals, coverage is poor when the signal-to-noise ratio is low, but improves slowly as sample size increases. We show that prediction intervals estimated by bagging yield much better coverages than the conventional approach, even with low signal-to-noise ratios. We also present a bootstrap method that gives confidence interval estimates for (conditional) expectations of efficiency, and which have good coverage properties that improve with sample size. In addition, researchers who estimate PSF models typically reject models, samples, or both when residuals have skewness in the “wrong” direction, i.e., in a direction that would seem to indicate absence of inefficiency. We show that correctly specified models can generate samples with “wrongly” skewed residuals, even when the variance of the inefficiency process is nonzero. Both our bagging and bootstrap methods provide useful information about inefficiency and model parameters irrespective of whether residuals have skewness in the desired direction. 相似文献

2.

Out-of-Bag Estimation of the Optimal Hyperparameter in SubBag Ensemble Method

Gai-Ying Zhang Jiang-She Zhang 《统计学通讯:模拟与计算》2013,42(10):1877-1892

SubBag is a technique by combining bagging and random subspace methods to generate ensemble classifiers with good generalization capability. In practice, a hyperparameter K of SubBag—the number of randomly selected features to create each base classifier—should be specified beforehand. In this article, we propose to employ the out-of-bag instances to determine the optimal value of K in SubBag. The experiments conducted with some UCI real-world data sets show that the proposed method can make SubBag achieve the optimal performance in nearly all the considered cases. Meanwhile, it occupied less computational sources than cross validation procedure. 相似文献

3.

Application and Reliability of Change-Point Analyses for Detecting a Defective Stage in Integrated Circuit Manufacturing

P. Besse C. Le Gall 《统计学通讯:模拟与计算》2013,42(2):479-496

ABSTRACT

This article presents a reliable method for highlighting a defective stage within a manufacturing process when the existence of a failure is only known at the end of the process. It was developed in the context of integrated circuit manufacturing, where low costs and high yields are indispensable if the manufacturer is to be competitive. Change detection methods were used to point out the defective stage. Two methods were compared and the best chosen. Thanks to this approach, it was possible to solve some yield problems for which the engineers' investigations were far from the real cause of failure. However, there is a strong requirement to assess the reliability of the suspicions cast on the incriminated stage, otherwise engineers could be made to do useless work and time could be wasted looking into events that are not the true cause of failure. Two complementary tools were implemented for this reliability assessment and their efficiency is illustrated by several examples. 相似文献

4.

Understanding the role of facial asymmetry in human face identification

Sinjini Mitra Nicole A. Lazar Yanxi Liu 《Statistics and Computing》2007,17(1):57-70

Face recognition has important applications in forensics (criminal identification) and security (biometric authentication). The problem of face recognition has been extensively studied in the computer vision community, from a variety of perspectives. A relatively new development is the use of facial asymmetry in face recognition, and we present here the results of a statistical investigation of this biometric. We first show how facial asymmetry information can be used to perform three different face recognition tasks—human identification (in the presence of expression variations), classification of faces by expression, and classification of individuals according to sex. Initially, we use a simple classification method, and conduct a feature analysis which shows the particular facial regions that play the dominant role in achieving these three entirely different classification goals. We then pursue human identification under expression changes in greater depth, since this is the most important task from a practical point of view. Two different ways of improving the performance of the simple classifier are then discussed: (i) feature combinations and (ii) the use of resampling techniques (bagging and random subspaces). With these modifications, we succeed in obtaining near perfect classification results on a database of 55 individuals, a statistically significant improvement over the initial results as seen by hypothesis tests of proportions. 相似文献

5.

IRUSRT: A Novel Imbalanced Learning Technique by Combining Inverse Random Under Sampling and Random Tree

Chun-Xia Zhang Jiang-She Zhang Gao Guo Qing-Yan Ying 《统计学通讯:模拟与计算》2013,42(10):2714-2731

In this article, a novel technique IRUSRT (inverse random under sampling and random tree) by combining inverse random under sampling and random tree is proposed to implement imbalanced learning. The main idea is to severely under sample the majority class thus creating multiple distinct training sets. With each training set, a random tree is trained to separate the minority class from the majority class. By combining these random trees through fusion, a composite classifier is constructed. The experimental analysis on 23 real-world datasets assessed over area under the ROC curve (AUC), F-measure, and G-mean indicates that IRUSRT performs significantly better when compared with many existing class imbalance learning methods. 相似文献

6.

Bagging方法在基于日常交易数据解决中小企业的信用评价问题中的应用

董彦文《中国管理信息化》2009,12(15):115-119

关于信用风险评价问题至今已经做了很多研究,各种信用评价模型与方法也已被开发.但是这些模型与方法几乎都是基于财务数据、股票价格或风险调研机构发表的各种调查结果.因为几乎所有的中小企业的财务数据都是非公开的,至今开发的信用评价模型与方法都不免成为无米之炊.为此,本文提出了一个新的途径,只需要根据销售额、顾客付款额、拖欠款额等日常业务处理数据来评价顾客企业的信用度.本文提出一个应用Bgging方法评价顾客信用的系统,其目的在于解决由于异常顾客数比正常顾客要少很多而带来的问题,提高分辨异常顾客的能力.本文所提出的信用评价系统将应用到一个实际企业的信用评价问题中,借此来验证系统的性能和效果. 相似文献

7.

Bayesian bootstrap prediction

Tadayoshi Fushiki 《Journal of statistical planning and inference》2010,140(1):65-74

In this paper, bootstrap prediction is adapted to resolve some problems in small sample datasets. The bootstrap predictive distribution is obtained by applying Breiman's bagging to the plug-in distribution with the maximum likelihood estimator. The effectiveness of bootstrap prediction has previously been shown, but some problems may arise when bootstrap prediction is constructed in small sample datasets. In this paper, Bayesian bootstrap is used to resolve the problems. The effectiveness of Bayesian bootstrap prediction is confirmed by some examples. These days, analysis of small sample data is quite important in various fields. In this paper, some datasets are analyzed in such a situation. For real datasets, it is shown that plug-in prediction and bootstrap prediction provide very poor prediction when the sample size is close to the dimension of parameter while Bayesian bootstrap prediction provides stable prediction. 相似文献

8.

A model-averaging approach for smoothing spline regression

Liwen Xu Jiabin Zhou 《统计学通讯:模拟与计算》2013,42(8):2438-2451

ABSTRACT

This article considers nonparametric regression problems and develops a model-averaging procedure for smoothing spline regression problems. Unlike most smoothing parameter selection studies determining an optimum smoothing parameter, our focus here is on the prediction accuracy for the true conditional mean of Y given a predictor X. Our method consists of two steps. The first step is to construct a class of smoothing spline regression models based on nonparametric bootstrap samples, each with an appropriate smoothing parameter. The second step is to average bootstrap smoothing spline estimates of different smoothness to form a final improved estimate. To minimize the prediction error, we estimate the model weights using a delete-one-out cross-validation procedure. A simulation study has been performed by using a program written in R. The simulation study provides a comparison of the most well known cross-validation (CV), generalized cross-validation (GCV), and the proposed method. This new method is straightforward to implement, and gives reliable performances in simulations. 相似文献

9.

Investigating the Effect of Randomly Selected Feature Subsets on Bagging and Boosting

Guan-Wei Wang Gao Guo 《统计学通讯:模拟与计算》2015,44(3):636-646

Bagging, boosting, and random subspace methods are three most commonly used approaches for constructing ensemble classifiers. In this article, the effect of randomly selected feature subsets (intersectant or disjoint) on bagging and boosting is investigated. The performance of the related ensemble methods are compared by conducting experiments on some UCI benchmark datasets. The results demonstrate that bagging can be generally improved using the randomly selected feature subsets whereas boosting can only be optimized in some cases. Furthermore, the diversity between classifiers in an ensemble is also discussed and related to the prediction accuracy of the ensemble classifier. 相似文献

10.

An empirical comparison of ensemble methods based on classification trees

《Journal of Statistical Computation and Simulation》2012,82(8):629-643

In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al. [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets. 相似文献