首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Classification and regression tree has been useful in medical research to construct algorithms for disease diagnosis or prognostic prediction. Jin et al. 7 Jin, H., Lu, Y., Harris, R. T., Black, D., Stone, K., Hochberg, M. and Genant, H. 2004. Classification algorithms for hip fracture prediction base on recursive partitioning methods. Med. Decis. Mak., 24: 386398. (doi:10.1177/0272989X04267009)[Crossref], [PubMed], [Web of Science ®] [Google Scholar] developed a robust and cost-saving tree (RACT) algorithm with application in classification of hip fracture risk after 5-year follow-up based on the data from the Study of Osteoporotic Fractures (SOF). Although conventional recursive partitioning algorithms have been well developed, they still have some limitations. Binary splits may generate a big tree with many layers, but trinary splits may produce too many nodes. In this paper, we propose a classification approach combining trinary splits and binary splits to generate a trinary–binary tree. A new non-inferiority test of entropy is used to select the binary or trinary splits. We apply the modified method in SOF to construct a trinary–binary classification rule for predicting risk of osteoporotic hip fracture. Our new classification tree has good statistical utility: it is statistically non-inferior to the optimum binary tree and the RACT based on the testing sample and is also cost-saving. It may be useful in clinical applications: femoral neck bone mineral density, age, height loss and weight gain since age 25 can identify subjects with elevated 5-year hip fracture risk without loss of statistical efficiency.  相似文献   

2.
Various aspects of the classification tree methodology of Breiman et al., (1984) are discussed. A method of displaying classification trees, called block diagrams, is developed. Block diagrams give a clear presentation of the classification, and are useful both to point out features of the particular data set under consideration and also to highlight deficiencies in the classification method being used. Various splitting criteria are discussed; the usual Gini-Simpson criterion presents difficulties when there is a relatively large number of classes and improved splitting criteria are obtained. One particular improvement is the introduction of adaptive anti-end-cut factors that take advantage of highly asymmetrical splits where appropriate. They use the number and mix of classes in the current node of the tree to identify whether or not it is likely to be advantageous to create a very small offspring node. A number of data sets are used as examples.  相似文献   

3.
Properties of the localized regression tree splitting criterion, described in Bremner & Taplin (2002) and referred to as the BT method, are explored in this paper and compared to those of Clark & Pregibon's (1992) criterion (the CP method). These properties indicate why the BT method can result in superior trees. This paper shows that the BT method exhibits a weak bias towards edge splits, and the CP method exhibits a strong bias towards central splits in the presence of main effects. A third criterion, called the SM method, that exhibits no bias towards a particular split position is introduced. The SM method is a modification of the BT method that uses more symmetric local means. The BT and SM methods are more likely to split at a discontinuity than the CP method because of their relatively low bias towards particular split positions. The paper shows that the BT and SM methods can be used to discover discontinuities in the data, and that they offer a way of producing a variety of different trees for examination or for tree averaging methods.  相似文献   

4.
We present an algorithm for learning oblique decision trees, called HHCART(G). Our decision tree combines learning concepts from two classification trees, HHCART and Geometric Decision Tree (GDT). HHCART(G) is a simplified HHCART algorithm that uses linear structure in the training examples, captured by a modified GDT angle bisector, to define splitting directions. At each node, we reflect the training examples with respect to the modified angle bisector to align this linear structure with the coordinate axes. Searching axis parallel splits in this reflected feature space provides an efficient and effective way of finding oblique splits in the original feature space. Our method is much simpler than HHCART because it only considers one reflected feature space for node splitting. HHCART considers multiple reflected feature spaces for node splitting making it more computationally intensive to build. Experimental results show that HHCART(G) is an effective classifier, producing compact trees with similar or better results than several other decision trees, including GDT and HHCART trees.  相似文献   

5.
Families of splitting criteria for classification trees   总被引:6,自引:0,他引:6  
Several splitting criteria for binary classification trees are shown to be written as weighted sums of two values of divergence measures. This weighted sum approach is then used to form two families of splitting criteria. One of them contains the chi-squared and entropy criterion, the other contains the mean posterior improvement criterion. Both family members are shown to have the property of exclusive preference. Furthermore, the optimal splits based on the proposed families are studied. We find that the best splits depend on the parameters in the families. The results reveal interesting differences among various criteria. Examples are given to demonstrate the usefulness of both families.  相似文献   

6.
When a process is monitored with a T 2 control chart in a Phase II setting, the MYT decomposition is a valuable diagnostic tool for interpreting signals in terms of the process variables. The decomposition splits a signaling T 2 statistic into independent components that can be associated with either individual variables or groups of variables. Since these components are T 2 statistics with known distributions, they can be used to determine which of the process variable(s) contribute to the signal. However, this procedure cannot be applied directly to Phase I since the distributions of the individual components are unknown. In this article, we develop the MYT decomposition procedure for a Phase I operation, when monitoring a random sample of individual observations and identifying outliers. We use a relationship between the T 2 statistic in Phase I with the corresponding T 2 statistic resulting when an observation is omitted from this sample to derive the distributions of these components and demonstrate the Phase I application of the MYT decomposition.  相似文献   

7.
This article investigates the existence of multiple regimes in the U.S. economy during the 1923—1991 period. A technique known as regression tree analysis is applied to search for splits in the data, if any exist, rather than choosing a splitting point a priori as has been done in previous work. Using this technique, strong evidence for the existence of nonlinear behavior of U.S. output is found over this period. Monte Carlo results are presented to assess the significance of the regime changes that are found.  相似文献   

8.
A strategy is presented for producing code to evaluate first partial derivatives of a function f by computer. Advantage is taken of the capabilities that a compiler must have simply to bring about the evaluation of f. A bound kt on the time it should take to evaluate the partials of f is developed, where f takes time t to evaluate and k is independent of f. The benefits of the differentiation strategy for weighted least squares analysis of categorical data are discussed.  相似文献   

9.
Likelihood ratio tests are considered for two testing situations; testing for the homogeneity of k normal means against the alternative restricted by a simple tree ordering trend and testing the null hypothesis that the means satisfy the trend against all alternatives. Exact expressions are given for the power functions for k = 3 and 4 and unequal sample sizes, both for the case of known and unknown population variances, and approximations are discussed for larger k. Also, Bartholomew’s conjectures concerning minimal and maximal powers are investigated for the case of equal and unequal sample sizes. The power formulas are used to compute powers for a numerical example.  相似文献   

10.
Tests of homogeneity of normal means with the alternative restricted by an ordering on the means are considered. The simply ordered case, μ1 ≤ μ2 ≤ ··· ≤ μk, and the simple tree ordering, μ1 ≤ μj, for; j= 2, 3,…, k, are emphasized. A modification of the likelihood-ratio test is proposed which is asymptotically equivalent to it but is more robust to violations of the hypothesized orderings. The new test has power at the points satisfying the hypothesized ordering which is similar to that of the likelihood-ratio test provided the degrees of freedom are not too small. The modified test is shown to be unbiased and consistent.  相似文献   

11.
Abstract. The strong Rayleigh property is a new and robust negative dependence property that implies negative association; in fact it implies conditional negative association closed under external fields (CNA+). Suppose that and are two families of 0‐1 random variables that satisfy the strong Rayleigh property and let . We show that {Zi} conditioned on is also strongly Rayleigh; this turns out to be an easy consequence of the results on preservation of stability of polynomials of Borcea & Brändén (Invent. Math., 177, 2009, 521–569). This entails that a number of important π ps sampling algorithms, including Sampford sampling and Pareto sampling, are CNA+. As a consequence, statistics based on such samples automatically satisfy a version of the Central Limit Theorem for triangular arrays.  相似文献   

12.
We address the problem of robust model selection for finite memory stochastic processes. Consider m independent samples, with most of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. We define the asymptotic breakdown point γ for a model selection procedure and also we devise a model selection procedure. We compute the value of γ which is 0.5, when all the processes are Markovian. This result is valid for any family of finite order Markov models but for simplicity we will focus on the family of variable length Markov chains.  相似文献   

13.
Tree-based methods similar to CART have recently been utilized for problems in which the main goal is to estimate some set of interest. It is often the case that the boundary of the true set is smooth in some sense, however tree-based estimates will not be smooth, as they will be a union of ‘boxes’. We propose a general methodology for smoothing such sets that allows for varying levels of smoothness on the boundary automatically. The method is similar to the idea underlying support vector machines, which is applying a computationally simple technique to data after a non-linear mapping to produce smooth estimates in the original space. In particular, we consider the problem of level-set estimation for regression functions and the dyadic tree-based method of Willett and Nowak [Minimax optimal level-set estimation, IEEE Trans. Image Process. 16 (2007), pp. 2965–2979].  相似文献   

14.
ABSTRACT

In this article, we studied the strong law of large numbers(LLN) and Shannon-McMillan theorem for an mth-order nonhomogeneous Markov chain indexed by an m- rooted Cayley tree. This article generalized the relative results of level mth-order nonhomogeneous Markov chains indexed by an m- rooted Cayley tree.  相似文献   

15.
This paper explores the performance of the local splitting criterion devised by Bremner & Taplin for classification and regression trees when multiple trees are averaged to improve performance. The criterion is compared with the deviance used by Clark & Pregibon's method, which is a global splitting criterion typically used to grow trees. The paper considers multiple trees generated by randomly selecting splits with probability proportional to the likelihood for the split, and by bagging where bootstrap samples from the data are used to grow trees. The superiority of the localized splitting criterion often persists when multiple trees are grown and averaged for six datasets. Tree averaging is known to be advantageous when the trees being averaged produce different predictions, and this can be achieved by choosing splits where the splitting criterion is locally optimal. The paper shows that use of locally optimal splits gives promising results in conjunction with both local and global splitting criteria, and with and without random selection of splits. The paper also extends the local splitting criterion to accommodate categorical predictors.  相似文献   

16.
Quantile regression (QR) proposed by Koenker and Bassett [Regression quantiles, Econometrica 46(1) (1978), pp. 33–50] is a statistical technique that estimates conditional quantiles. It has been widely studied and applied to economics. Meinshausen [Quantile regression forests, J. Mach. Learn. Res. 7 (2006), pp. 983–999] proposed quantile regression forests (QRF), a non-parametric way based on random forest. QRF performs well in terms of prediction accuracy, but it struggles with noisy data sets. This motivates us to propose a multi-step QR tree method using GUIDE (Generalized, Unbiased, Interaction Detection and Estimation) made by Loh [Regression trees with unbiased variable selection and interaction detection, Statist. Sinica 12 (2002), pp. 361–386]. Our simulation study shows that the multi-step QR tree performs better than a single tree or QRF especially when it deals with data sets having many irrelevant variables.  相似文献   

17.
In this article, we will study the strong laws of large numbers and asymptotic equipartition property (AEP) for mth-order asymptotic odd–even Markov chains indexed by an m-rooted Cayley tree. First, the definition of mth-order asymptotic odd–even Markov chains indexed by an m-rooted Cayley tree is introduced, then the strong limit theorem for this Markov chains is established. Next, the strong laws of large numbers for the frequencies of ordered couple of states for mth-order asymptotic odd–even Markov chains indexed by an m-rooted Cayley tree are obtained. Finally, we prove the AEP for this Markov chains.  相似文献   

18.
In order to accelerate object evaluation, some measurement systems commonly use an ordinal scale (e.g., stick results, quality estimation). This paper presents a way to analyze ordinal data variation. As in classical ANOVA for continual data, ORDANOVA for ordinal data splits the total variation into within and between components. This decomposition has various practical applications such as classification, cluster analysis, distinguishing feature identification and so on.  相似文献   

19.
In this work, we study D s -optimal design for Kozak's tree taper model. The approximate D s -optimal designs are found invariant to tree size and hence create a ground to construct a general replication-free D s -optimal design. Even though the designs are found not to be dependent on the parameter value p of the Kozak's model, they are sensitive to the s×1 subset parameter vector values of the model. The 12 points replication-free design (with 91% efficiency) suggested in this study is believed to reduce cost and time for data collection and more importantly to precisely estimate the subset parameters of interest.  相似文献   

20.
The present investigation was undertaken to study the gillnet catch efficiency of sardines in the coastal waters of Sri Lanka using commercial catch and effort data. Commercial catch and effort data of small mesh gillnet fishery were collected in five fisheries districts during the period May 1999–August 2002. Gillnet catch efficiency of sardines was investigated by developing catch rates predictive models using data on commercial fisheries and environmental variables. Three statistical techniques [multiple linear regression, generalized additive model and regression tree model (RTM)] were employed to predict the catch rates of trenched sardine Amblygaster sirm (key target species of small mesh gillnet fishery) and other sardines (Sardinella longiceps, S. gibbosa, S. albella and S. sindensis). The data collection programme was conducted for another six months and the models were tested on new data. RTMs were found to be the strongest in terms of reliability and accuracy of the predictions. The two operational characteristics used here for model formulation (i.e. depth of fishing and number of gillnet pieces used per fishing operation) were more useful as predictor variables than the environmental variables. The study revealed a rapid tendency of increasing the catch rates of A. sirm with increased sea depth up to around 32 m.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号