首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
This paper explores the performance of the local splitting criterion devised by Bremner & Taplin for classification and regression trees when multiple trees are averaged to improve performance. The criterion is compared with the deviance used by Clark & Pregibon's method, which is a global splitting criterion typically used to grow trees. The paper considers multiple trees generated by randomly selecting splits with probability proportional to the likelihood for the split, and by bagging where bootstrap samples from the data are used to grow trees. The superiority of the localized splitting criterion often persists when multiple trees are grown and averaged for six datasets. Tree averaging is known to be advantageous when the trees being averaged produce different predictions, and this can be achieved by choosing splits where the splitting criterion is locally optimal. The paper shows that use of locally optimal splits gives promising results in conjunction with both local and global splitting criteria, and with and without random selection of splits. The paper also extends the local splitting criterion to accommodate categorical predictors.  相似文献   

2.
Various aspects of the classification tree methodology of Breiman et al., (1984) are discussed. A method of displaying classification trees, called block diagrams, is developed. Block diagrams give a clear presentation of the classification, and are useful both to point out features of the particular data set under consideration and also to highlight deficiencies in the classification method being used. Various splitting criteria are discussed; the usual Gini-Simpson criterion presents difficulties when there is a relatively large number of classes and improved splitting criteria are obtained. One particular improvement is the introduction of adaptive anti-end-cut factors that take advantage of highly asymmetrical splits where appropriate. They use the number and mix of classes in the current node of the tree to identify whether or not it is likely to be advantageous to create a very small offspring node. A number of data sets are used as examples.  相似文献   

3.
Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models.  相似文献   

4.
Families of splitting criteria for classification trees   总被引:6,自引:0,他引:6  
Several splitting criteria for binary classification trees are shown to be written as weighted sums of two values of divergence measures. This weighted sum approach is then used to form two families of splitting criteria. One of them contains the chi-squared and entropy criterion, the other contains the mean posterior improvement criterion. Both family members are shown to have the property of exclusive preference. Furthermore, the optimal splits based on the proposed families are studied. We find that the best splits depend on the parameters in the families. The results reveal interesting differences among various criteria. Examples are given to demonstrate the usefulness of both families.  相似文献   

5.
Variable selection is one of the main problems faced by data mining and machine learning techniques. These techniques are often, more or less explicitly, based on some measure of variable importance. This paper considers Total Decrease in Node Impurity (TDNI) measures, a popular class of variable importance measures defined in the field of decision trees and tree-based ensemble methods, like Random Forests and Gradient Boosting Machines. In spite of their wide use, some measures of this class are known to be biased and some correction strategies have been proposed. The aim of this paper is twofold. Firstly, to investigate the source and the characteristics of bias in TDNI measures using the notions of informative and uninformative splits. Secondly, a bias-correction algorithm, recently proposed for the Gini measure in the context of classification, is extended to the entire class of TDNI measures and its performance is investigated in the regression framework using simulated and real data.  相似文献   

6.
Properties of the localized regression tree splitting criterion, described in Bremner & Taplin (2002) and referred to as the BT method, are explored in this paper and compared to those of Clark & Pregibon's (1992) criterion (the CP method). These properties indicate why the BT method can result in superior trees. This paper shows that the BT method exhibits a weak bias towards edge splits, and the CP method exhibits a strong bias towards central splits in the presence of main effects. A third criterion, called the SM method, that exhibits no bias towards a particular split position is introduced. The SM method is a modification of the BT method that uses more symmetric local means. The BT and SM methods are more likely to split at a discontinuity than the CP method because of their relatively low bias towards particular split positions. The paper shows that the BT and SM methods can be used to discover discontinuities in the data, and that they offer a way of producing a variety of different trees for examination or for tree averaging methods.  相似文献   

7.
In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al. [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets.  相似文献   

8.
This paper proposes modified splitting criteria for classification and regression trees by modifying the definition of the deviance. The modified deviance is based on local averaging instead of global averaging and is more successful at modelling data with interactions. The paper shows that the modified criteria result in much simpler trees for pure interaction data (no main effects) and can produce trees with fewer errors and lower residual mean deviances than those produced by Clark & Pregibon's (1992) method when applied to real datasets with strong interaction effects.  相似文献   

9.
This paper considers the computation of the conditional stationary distribution in Markov chains of level-dependent M/G/1-type, given that the level is not greater than a predefined threshold. This problem has been studied recently and a computational algorithm is proposed under the assumption that matrices representing downward jumps are nonsingular. We first show that this assumption can be eliminated in a general setting of Markov chains of level-dependent G/G/1-type. Next we develop a computational algorithm for the conditional stationary distribution in Markov chains of level-dependent M/G/1-type, by modifying the above-mentioned algorithm slightly. In principle, our algorithm is applicable to any Markov chain of level-dependent M/G/1-type, if the Markov chain is irreducible and positive-recurrent. Furthermore, as an input to the algorithm, we can set an error bound for the computed conditional distribution, which is a notable feature of our algorithm. Some numerical examples are also provided.  相似文献   

10.
Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles. Then, usually, a constant in every partition is fitted. While this is a simple and intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and independent variables may look. Or it may be that a certain model is assumed or of interest and there is a number of candidate variables that may non-linearly give rise to different model parameter values. We present an approach that combines generalized linear models (GLM) with recursive partitioning that offers enhanced interpretability of classical trees as well as providing an explorative way to assess a candidate variable's influence on a parametric model. This method conducts recursive partitioning of a GLM by (1) fitting the model to the data set, (2) testing for parameter instability over a set of partitioning variables, (3) splitting the data set with respect to the variable associated with the highest instability. The outcome is a tree where each terminal node is associated with a GLM. We will show the method's versatility and suitability to gain additional insight into the relationship of dependent and independent variables by two examples, modelling voting behaviour and a failure model for debt amortization, and compare it to alternative approaches.  相似文献   

11.
A new method for analyzing high-dimensional categorical data, Linear Latent Structure (LLS) analysis, is presented. LLS models belong to the family of latent structure models, which are mixture distribution models constrained to satisfy the local independence assumption. LLS analysis explicitly considers a family of mixed distributions as a linear space, and LLS models are obtained by imposing linear constraints on the mixing distribution.LLS models are identifiable under modest conditions and are consistently estimable. A remarkable feature of LLS analysis is the existence of a high-performance numerical algorithm, which reduces parameter estimation to a sequence of linear algebra problems. Simulation experiments with a prototype of the algorithm demonstrated a good quality of restoration of model parameters.  相似文献   

12.
Learning classification trees   总被引:11,自引:0,他引:11  
Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.  相似文献   

13.
The author considers (asymptotically) minimax extrapolation designs for an approximately multiple linear model with the model contaminant f being restricted only by its L2 norm. He splits the integrated mean squared prediction error (IMSPE) of the fitted value over the extrapolation space into two parts, namely the integrated prediction variance (IPV) and the integrated prediction bias (IPB). For a spherical design space and an annular extrapolation space, he constructs the design that minimizes the maximum value, over f, of IPB subject to bounding IPV. He also constructs the design that minimizes IPV subject to bounding the maximum IPB.  相似文献   

14.
选择性集成算法是目前机器学习关注的热点之一。在对一海藻繁殖案例研究的基础上,提出了一种基于k—nleanS聚类技术的快速选择性BaggingTre咚集成算法;同时与传统统计方法和一些常用的机器学习方法相比较,发现该算法具有较小的模型推广误差和更高的预测精度的优点,而且其运行的效率也得到了较大的提高。  相似文献   

15.
Motivated by examples in spectroscopy, we study variable selection for discrimination in problems with very many predictor variables. Assuming multivariate normal distributions with common variance for the predictor variables within groups, we develop a Bayesian decision theory approach that balances costs for variables against a loss due to classification errors. The approach is computationally intensive, requiring a simulation to approximate the intractable expected loss and a search, using simulated annealing, over a large space of possible subsets of variables. It is illustrated by application to a spectroscopic example with 3 groups, 100 variables, and 71 training cases, where the approach finds subsets of between 5 and 14 variables whose discriminatory power is comparable with that of linear discriminant analysis using principal components derived from the full 100 variables. We study both the evaluation of expected loss and the tuning of the simulated annealing for the example, and conclude that computational effort should be concentrated on the search.  相似文献   

16.
In this paper, we improve upon the Carlin and Chib Markov chain Monte Carlo algorithm that searches in model and parameter spaces. Our proposed algorithm attempts non-uniformly chosen ‘local’ moves in the model space and avoids some pitfalls of other existing algorithms. In a series of examples with linear and logistic regression, we report evidence that our proposed algorithm performs better than the existing algorithms.  相似文献   

17.
Many algorithms originated from decision trees have been developed for classification problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy, namely high misclassification rates when there are many irrelevant variables. We propose multi-step classification trees with adaptive variable selection (the multi-step GUIDE classification tree (MG) and the multi-step CRUISE classification tree (MC) to handle this problem. The variable selection step and the fitting step comprise the multi-step method.

We compare the performance of classification trees in the presence of irrelevant variables. MG and MC perform better than Random Forest and C4.5 with an extremely noisy dataset. Furthermore, the prediction accuracy of our proposed algorithm is relatively stable even when the number of irrelevant variables increases, while that of other algorithms worsens.  相似文献   

18.
梯度Boosting思想在解释Boosting算法的运行机制时基于基学习器张成的空间为连续泛函空间,但是实际上在有限样本条件下形成的基学习器空间不一定是连续的。针对这一问题,从可加模型的角度出发,基于平方损失,提出一种重抽样提升回归树的新方法。该方法是一种加权的加法模型的逐步更新算法。实验结果表明,这种方法可以显著地提升一棵回归树的效果,减小预测误差,并且能得到比L2Boost算法更低的预测误差。  相似文献   

19.
In this article, a novel technique IRUSRT (inverse random under sampling and random tree) by combining inverse random under sampling and random tree is proposed to implement imbalanced learning. The main idea is to severely under sample the majority class thus creating multiple distinct training sets. With each training set, a random tree is trained to separate the minority class from the majority class. By combining these random trees through fusion, a composite classifier is constructed. The experimental analysis on 23 real-world datasets assessed over area under the ROC curve (AUC), F-measure, and G-mean indicates that IRUSRT performs significantly better when compared with many existing class imbalance learning methods.  相似文献   

20.
Support vector machine (SVM) is sparse in that its classifier is expressed as a linear combination of only a few support vectors (SVs). Whenever an outlier is included as an SV in the classifier, the outlier may have serious impact on the estimated decision function. In this article, we propose a robust loss function that is convex. Our learning algorithm is more robust to outliers than SVM. Also the convexity of our loss function permits an efficient solution path algorithm. Through simulated and real data analysis, we illustrate that our method can be useful in the presence of labeling errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号