首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an algorithm for learning oblique decision trees, called HHCART(G). Our decision tree combines learning concepts from two classification trees, HHCART and Geometric Decision Tree (GDT). HHCART(G) is a simplified HHCART algorithm that uses linear structure in the training examples, captured by a modified GDT angle bisector, to define splitting directions. At each node, we reflect the training examples with respect to the modified angle bisector to align this linear structure with the coordinate axes. Searching axis parallel splits in this reflected feature space provides an efficient and effective way of finding oblique splits in the original feature space. Our method is much simpler than HHCART because it only considers one reflected feature space for node splitting. HHCART considers multiple reflected feature spaces for node splitting making it more computationally intensive to build. Experimental results show that HHCART(G) is an effective classifier, producing compact trees with similar or better results than several other decision trees, including GDT and HHCART trees.  相似文献   

2.
We study the problem of the convergence of the adjusted binomial lookback option in double-exponential jump diffusion models. By using the results of [Dai, M., (2000). A modified binomial tree method for currency lookback options. Acta Mathematica Sinica, 16, 445–454; Kou, S., & Wang, H. (2004). Option pricing under a double exponential jump diffusion model. Management Science, 50, 1178–1192] and [Park, H.S., Kim, K.I., & Qian, X. (2009). A Mathematical modeling for the Lookback option with jump diffusion using Binomial tree method. Journal of Computational and Applied Mathematics, preprint], we show the equivalence between the adjusted binomial tree method and the explicit difference scheme. The convergence is also theoretically proved through the notion of viscosity solution. Numerical results coincide with the theoretical results.  相似文献   

3.
4.
《随机性模型》2013,29(3):341-368
Abstract

We consider a flow of data packets from one source to many destinations in a communication network represented by a random oriented tree. Multicast transmission is characterized by the ability of some tree vertices to replicate received packets depending on the number of destinations downstream. We are interested in characteristics of multicast flows on Galton–Watson trees and trees generated by point aggregates of a Poisson process. Such stochastic settings are intended to represent tree shapes arising in the Internet and in some ad hoc networks. The main result in the branching process case is a functional equation for the joint probability generating function of flow volumes through a given vertex and in the whole tree. We provide conditions for the existence and uniqueness of solution and a method to compute it using Picard iterations. In the point process case, we provide bounds on flow volumes using the technique of stochastic comparison from the theory of continuous percolation. We use these results to derive a number of random trees' characteristics and discuss their applications to analytical evaluation of the load induced on a network by a multicast session.  相似文献   

5.
Group testing is a method of pooling a number of units together and performing a single test on the resulting group. Group testing is an appealing option when few individual units are thought to be infected and the cost of the testing is non-negligible. Overdispersion is the phenomenon of having greater variability than predicted by the random component of the model; this is common in the modeling of binomial distribution for group testing. The purpose of this paper is to provide a comparison of several established methods of constructing confidence intervals after adjusting for overdispersion. We evaluate and investigate each method in six different cases of group testing. A method based on the score statistic with correction for skewness is recommended. We illustrate the methods using two data sets, one from the detection of seed transmission and the other from serological testing for malaria.  相似文献   

6.
Mixture distribution survival trees are constructed by approximating different nodes in the tree by distinct types of mixture distributions to improve within node homogeneity. Previously, we proposed a mixture distribution survival tree-based method for determining clinically meaningful patient groups from a given dataset of patients’ length of stay. This article extends this approach to examine the interrelationship between length of stay in hospital, outcome measures, and other covariates. We describe an application of this approach to patient pathway and examine the relationship between length of stay in hospital and/or treatment outcome using five-years’ retrospective data of stroke patients.  相似文献   

7.
The usual assumptions for the average case analysis of binary search trees (BSTs) are random insertions and random deletions. If a BST is built by n random insertions the expected number of key comparisons necessary to access a node is 2 ln n+O(1). This well-known result is already contained in the first papers on such ‘random’ BSTs. However, if random insertions are intermixed with random deletions the analysis of the resulting BST seems to become more intricate. At least this is the impression one gets from the related publications since 1962, and it is quite appropriate to speak of a story of errors in this context, as will be seen in the present survey paper, giving an overview on this story.  相似文献   

8.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariate binary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.  相似文献   

9.
The Bayesian CART (classification and regression tree) approach proposed by Chipman, George and McCulloch (1998) entails putting a prior distribution on the set of all CART models and then using stochastic search to select a model. The main thrust of this paper is to propose a new class of hierarchical priors which enhance the potential of this Bayesian approach. These priors indicate a preference for smooth local mean structure, resulting in tree models which shrink predictions from adjacent terminal node towards each other. Past methods for tree shrinkage have searched for trees without shrinking, and applied shrinkage to the identified tree only after the search. By using hierarchical priors in the stochastic search, the proposed method searches for shrunk trees that fit well and improves the tree through shrinkage of predictions.  相似文献   

10.
In this article, we present a procedure for approximate negative binomial tolerance intervals. We utilize an approach that has been well-studied to approximate tolerance intervals for the binomial and Poisson settings, which is based on the confidence interval for the parameter in the respective distribution. A simulation study is performed to assess the coverage probabilities and expected widths of the tolerance intervals. The simulation study also compares eight different confidence interval approaches for the negative binomial proportions. We recommend using those in practice that perform the best based on our simulation results. The method is also illustrated using two real data examples.  相似文献   

11.
This paper investigates the design of accelerated life test (ALT) plans under progressive Type II interval censoring with random removals. Units’ lifetimes are assumed to follow a Weibull distribution, and the number of random removals at each inspection is assumed to follow a binomial distribution. The optimal ALT plans, which minimize the asymptotic variance of an estimated quantile at use condition, are determined. The expected duration of the test and the expected number of inspections on each stress level are calculated. A numerical study is conducted to investigate the properties of the derived ALT plans under different parameter values. For illustration purpose, a numerical example is also given.  相似文献   

12.
The estimation of abundance from presence–absence data is an intriguing problem in applied statistics. The classical Poisson model makes strong independence and homogeneity assumptions and in practice generally underestimates the true abundance. A controversial ad hoc method based on negative‐binomial counts (Am. Nat.) has been empirically successful but lacks theoretical justification. We first present an alternative estimator of abundance based on a paired negative binomial model that is consistent and asymptotically normally distributed. A quadruple negative binomial extension is also developed, which yields the previous ad hoc approach and resolves the controversy in the literature. We examine the performance of the estimators in a simulation study and estimate the abundance of 44 tree species in a permanent forest plot.  相似文献   

13.
We propose a new generalized autoregressive conditional heteroscedastic (GARCH) model with tree-structured multiple thresholds for the estimation of volatility in financial time series. The approach relies on the idea of a binary tree where every terminal node parameterizes a (local) GARCH model for a partition cell of the predictor space. The fitting of such trees is constructed within the likelihood framework for non-Gaussian observations: it is very different from the well-known regression tree procedure which is based on residual sums of squares. Our strategy includes the classical GARCH model as a special case and allows us to increase model complexity in a systematic and flexible way. We derive a consistency result and conclude from simulation and real data analysis that the new method has better predictive potential than other approaches.  相似文献   

14.
This paper discusses five methods for constructing approximate confidence intervals for the binomial parameter Θ, based on Y successes in n Bernoulli trials. In a recent paper, Chen (1990) discusses various approximate methods and suggests a new method based on a Bayes argument, which we call method I here. Methods II and III are based on the normal approximation without and with continuity correction. Method IV uses the Poisson approximation of the binomial distribution and then exploits the fact that the exact confidence limits for the parameter of the Poisson distribution can be found through the x2 distribution. The confidence limits of method IV are then provided by the Wilson-Hilferty approximation of the x2. Similarly, the exact confidence limits for the binomial parameter can be expressed through the F distribution. Method V approximates these limits through a suitable version of the Wilson-Hilferty approximation. We undertake a comparison of the five methods in respect to coverage probability and expected length. The results indicate that method V has an advantage over Chen's Bayes method as well as over the other three methods.  相似文献   

15.
In this article, we present a general method for deriving Stein-like identity and Chernoff-like inequality based on orthogonal polynomials. In order to illustrate our method, some applications are given with respect to normal, Gamma, Beta, Poisson, binomial, and negative binomial distribution, not only for random variables but also for random vectors, resulting corresponding Stein-like identity and Chernoff-like inequality are obtained consequently. Within our best knowledge, some of our matrix version results are new in the literature. In addition, forward difference formulae of Charlier polynomials, Krawtchouk polynomials and Meixner polynomials, Stein-like identity, and Chernoff-like inequality with respect to Beta distribution, as well as Rodrigues formula of Meixner polynomials are also prepared in the first time within our limited information. Interestingly, as far as normal, Gamma, Beta, Poisson, binomial, and negative binomial distribution are concerned, we found that their Stein-like identity and corresponding Chernoff-like inequality are related closely, by examining their Rodrigues formula.  相似文献   

16.
Summary.  The data that are analysed are from a monitoring survey which was carried out in 1994 in the forests of Baden-Württemberg, a federal state in the south-western region of Germany. The survey is part of a large monitoring scheme that has been carried out since the 1980s at different spatial and temporal resolutions to observe the increase in forest damage. One indicator for tree vitality is tree defoliation, which is mainly caused by intrinsic factors, age and stand conditions, but also by biotic (e.g. insects) and abiotic stresses (e.g. industrial emissions). In the survey, needle loss of pine-trees and many potential covariates are recorded at about 580 grid points of a 4 km × 4 km grid. The aim is to identify a set of predictors for needle loss and to investigate the relationships between the needle loss and the predictors. The response variable needle loss is recorded as a percentage in 5% steps estimated by eye using binoculars and categorized into healthy trees (10% or less), intermediate trees (10–25%) and damaged trees (25% or more). We use a Bayesian cumulative threshold model with non-linear functions of continuous variables and a random effect for spatial heterogeneity. For both the non-linear functions and the spatial random effect we use Bayesian versions of P -splines as priors. Our method is novel in that it deals with several non-standard data requirements: the ordinal response variable (the categorized version of needle loss), non-linear effects of covariates, spatial heterogeneity and prediction with missing covariates. The model is a special case of models with a geoadditive or more generally structured additive predictor. Inference can be based on Markov chain Monte Carlo techniques or mixed model technology.  相似文献   

17.
In this article, a novel technique IRUSRT (inverse random under sampling and random tree) by combining inverse random under sampling and random tree is proposed to implement imbalanced learning. The main idea is to severely under sample the majority class thus creating multiple distinct training sets. With each training set, a random tree is trained to separate the minority class from the majority class. By combining these random trees through fusion, a composite classifier is constructed. The experimental analysis on 23 real-world datasets assessed over area under the ROC curve (AUC), F-measure, and G-mean indicates that IRUSRT performs significantly better when compared with many existing class imbalance learning methods.  相似文献   

18.
Many methods are available for computing a confidence interval for the binomial parameter, and these methods differ in their operating characteristics. It has been suggested in the literature that the use of the exact likelihood ratio (LR) confidence interval for the binomial proportion should be considered. This paper provides an evaluation of the operating characteristics of the two‐sided exact LR and exact score confidence intervals for the binomial proportion and compares these results to those for three other methods that also strictly maintain nominal coverage: Clopper‐Pearson, Blaker, and Casella. In addition, the operating characteristics of the two‐sided exact LR method and exact score method are compared with those of the corresponding asymptotic methods to investigate the adequacy of the asymptotic approximation. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
Many algorithms originated from decision trees have been developed for classification problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy, namely high misclassification rates when there are many irrelevant variables. We propose multi-step classification trees with adaptive variable selection (the multi-step GUIDE classification tree (MG) and the multi-step CRUISE classification tree (MC) to handle this problem. The variable selection step and the fitting step comprise the multi-step method.

We compare the performance of classification trees in the presence of irrelevant variables. MG and MC perform better than Random Forest and C4.5 with an extremely noisy dataset. Furthermore, the prediction accuracy of our proposed algorithm is relatively stable even when the number of irrelevant variables increases, while that of other algorithms worsens.  相似文献   

20.
Hall (2000) has described zero‐inflated Poisson and binomial regression models that include random effects to account for excess zeros and additional sources of heterogeneity in the data. The authors of the present paper propose a general score test for the null hypothesis that variance components associated with these random effects are zero. For a zero‐inflated Poisson model with random intercept, the new test reduces to an alternative to the overdispersion test of Ridout, Demério & Hinde (2001). The authors also examine their general test in the special case of the zero‐inflated binomial model with random intercept and propose an overdispersion test in that context which is based on a beta‐binomial alternative.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号