首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This article introduces BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule use the mathematical programming routines of the SAS/OR software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in Best-Class with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors.  相似文献   

2.
Logistic discrimination is a well documented method for classifying observations to two or more groups. However, estimation of the discriminant rule can be seriously affected by outliers. To overcome this, Cox and Ferry produced a robust logistic discrimination technique. Although their method worked in practice, parameter estimation was sometimes prone to convergence problems. This paper proposes a simplified robust logistic model which does not have any such problems and which takes a generalized linear model form. Misclassification rates calculated in a simulation exercise are used to compare the new method with ordinary logistic discrimination. Model diagnostics are also presented. The newly proposed model is then used on data collected from pregnant women at two district general hospitals. A robust logistic discriminant is calculated which can be used to predict accurately which method of feeding a woman will eventually use: breast feeding or bottle feeding.  相似文献   

3.
In this article, a sequential correction of two linear methods: linear discriminant analysis (LDA) and perceptron is proposed. This correction relies on sequential joining of additional features on which the classifier is trained. These new features are posterior probabilities determined by a basic classification method such as LDA and perceptron. In each step, we add the probabilities obtained on a slightly different data set, because the vector of added probabilities varies at each step. We therefore have many classifiers of the same type trained on slightly different data sets. Four different sequential correction methods are presented based on different combining schemas (e.g. mean rule and product rule). Experimental results on different data sets demonstrate that the improvements are efficient, and that this approach outperforms classical linear methods, providing a significant reduction in the mean classification error rate.  相似文献   

4.
Non parametric approaches to classification have gained significant attention in the last two decades. In this paper, we propose a classification methodology based on the multivariate rank functions and show that it is a Bayes rule for spherically symmetric distributions with a location shift. We show that a rank-based classifier is equivalent to optimal Bayes rule under suitable conditions. We also present an affine invariant version of the classifier. To accommodate different covariance structures, we construct a classifier based on the central rank region. Asymptotic properties of these classification methods are studied. We illustrate the performance of our proposed methods in comparison to some other depth-based classifiers using simulated and real data sets.  相似文献   

5.
The article considers nonparametric inference for quantile regression models with time-varying coefficients. The errors and covariates of the regression are assumed to belong to a general class of locally stationary processes and are allowed to be cross-dependent. Simultaneous confidence tubes (SCTs) and integrated squared difference tests (ISDTs) are proposed for simultaneous nonparametric inference of the latter models with asymptotically correct coverage probabilities and Type I error rates. Our methodologies are shown to possess certain asymptotically optimal properties. Furthermore, we propose an information criterion that performs consistent model selection for nonparametric quantile regression models of nonstationary time series. For implementation, a wild bootstrap procedure is proposed, which is shown to be robust to the dependent and nonstationary data structure. Our method is applied to studying the asymmetric and time-varying dynamic structures of the U.S. unemployment rate since the 1940s. Supplementary materials for this article are available online.  相似文献   

6.
A nonparametric discriminant analysis procedure that is robust to deviations from the usual assumptions is proposed. The procedure uses the projection pursuit methodology where the projection index is the two-group transvariation probability. We use allocation based on the centrality of the new point measured using a smooth version of point-group transvariation. It is shown that the new procedure provides lower misclassification error rates than competing methods for data from skewed heavy-tailed and skewed distributions as well as unequal training data sizes.  相似文献   

7.
The choice of smoothing determines the properties of nonparametric estimates of probability densities. In the discrimination problem, the choice is often tied to loss functions. A framework for the cross–validatory choice of smoothing parameters based on general loss functions is given. Several loss functions are considered as special cases. In particular, a family of loss functions, which is connected to discrimination problems, is directly related to measures of performance used in discrimination. Consistency results are given for a general class of loss functions which comprise this family of discriminant loss functions.  相似文献   

8.
We propose tests for hypotheses on the parameters of the deterministic trend function of a univariate time series. The tests do not require knowledge of the form of serial correlation in the data, and they are robust to strong serial correlation. The data can contain a unit root and still have the correct size asymptotically. The tests that we analyze are standard heteroscedasticity autocorrelation robust tests based on nonparametric kernel variance estimators. We analyze these tests using the fixed-b asymptotic framework recently proposed by Kiefer and Vogelsang. This analysis allows us to analyze the power properties of the tests with regard to bandwidth and kernel choices. Our analysis shows that among popular kernels, specific kernel and bandwidth choices deliver tests with maximal power within a specific class of tests. Based on the theoretical results, we propose a data-dependent bandwidth rule that maximizes integrated power. Our recommended test is shown to have power that dominates a related test proposed by Vogelsang. We apply the recommended test to the logarithm of a net barter terms of trade series and we find that this series has a statistically significant negative slope. This finding is consistent with the well-known Prebisch–Singer hypothesis.  相似文献   

9.
Abstract

In change detection problem, the distribution of a series of observations can change at some unknown instant. The aim of on-line change detection rule is to detect this change, as rapidly as possible, while ensuring a low rate of false alarm. The most popular rule to treat this problem is the Page’s CUSUM rule. The use of this rule supposes that the two distributions, before and after the change, are known, which is often restrictive in practice. In this article, a nonparametric rule is proposed. Only two learning samples, characterizing the in-control and the out-of-control functioning modes of the system, are needed to implement the rule. The new detection approach is based on the use of a well-known nonparametric method, Empirical Likelihood. Some numerical studies show the relevance of our approach, especially when the size of the learning samples are quite small.  相似文献   

10.
Nowadays airborne laser scanning is used in many territorial studies, providing point data which may contain strong discontinuities. Motivated by the need to interpolate such data and preserve their edges, this paper considers robust nonparametric smoothers. These estimators, when implemented with bounded loss functions, have suitable jump‐preserving properties. Iterative algorithms are developed here, and are equivalent to nonlinear M‐smoothers, but have the advantage of resembling the linear Kernel regression. The selection of their coefficients is carried out by combining cross‐validation and robust‐tuning techniques. Two real case studies and a simulation experiment confirm the validity of the method; in particular, the performance in building recognition is excellent.  相似文献   

11.
This article proposes a discriminant function and an algorithm to analyze the data addressing the situation, where the data are positively skewed. The performance of the suggested algorithm based on the suggested discriminant function (LNDF) has been compared with the conventional linear discriminant function (LDF) and quadratic discriminant function (QDF) as well as with the nonparametric support vector machine (SVM) and the Random Forests (RFs) classifiers, using real and simulated datasets. A maximum reduction of approximately 81% in the error rates as compared to QDF for ten-variate data was noted. The overall results are indicative of better performance of the proposed discriminant function under certain circumstances.  相似文献   

12.
In recent years permutation testing methods have increased both in number of applications and in solving complex multivariate problems. When available permutation tests are essentially of an exact nonparametric nature in a conditional context, where conditioning is on the pooled observed data set which is often a set of sufficient statistics in the null hypothesis. Whereas, the reference null distribution of most parametric tests is only known asymptotically. Thus, for most sample sizes of practical interest, the possible lack of efficiency of permutation solutions may be compensated by the lack of approximation of parametric counterparts. There are many complex multivariate problems, quite common in empirical sciences, which are difficult to solve outside the conditional framework and in particular outside the method of nonparametric combination (NPC) of dependent permutation tests. In this paper we review such a method and its main properties along with some new results in experimental and observational situations (robust testing, multi-sided alternatives and testing for survival functions).  相似文献   

13.
We study the influence of a single data case on the results of a statistical analysis. This problem has been addressed in several articles for linear discriminant analysis (LDA). Kernel Fisher discriminant analysis (KFDA) is a kernel based extension of LDA. In this article, we study the effect of atypical data points on KFDA and develop criteria for identification of cases having a detrimental effect on the classification performance of the KFDA classifier. We find that the criteria are successful in identifying cases whose omission from the training data prior to obtaining the KFDA classifier results in reduced error rates.  相似文献   

14.
This paper considers robust variable selection in semiparametric modeling for longitudinal data with an unspecified dependence structure. First, by basis spline approximation and using a general formulation to treat mean, median, quantile and robust mean regressions in one setting, we propose a weighted M-type regression estimator, which achieves robustness against outliers in both the response and covariates directions, and can accommodate heterogeneity, and the asymptotic properties are also established. Furthermore, a penalized weighted M-type estimator is proposed, which can do estimation and select relevant nonparametric and parametric components simultaneously, and robustly. Without any specification of error distribution and intra-subject dependence structure, the variable selection method works beautifully, including consistency in variable selection and oracle property in estimation. Simulation studies also confirm our method and theories.  相似文献   

15.
Tang Qingguo 《Statistics》2015,49(6):1262-1278
This paper studies estimation in semi-functional linear regression. A general formulation is used to treat mean regression, median regression, quantile regression and robust mean regression in one setting. The linear slope function is estimated by the functional principal component basis and the nonparametric component is approximated by a B-spline function. The global convergence rates of the estimators of unknown slope function and nonparametric component are established under suitable norm. The convergence rate of the mean-squared prediction error for the proposed estimators is also established. Finite sample properties of our procedures are studied through Monte Carlo simulations. A real data example about Berkeley growth data is used to illustrate our proposed methodology.  相似文献   

16.
The area under the ROC curve (AUC) can be interpreted as the probability that the classification scores of a diseased subject is larger than that of a non-diseased subject for a randomly sampled pair of subjects. From the perspective of classification, we want to find a way to separate two groups as distinctly as possible via AUC. When the difference of the scores of a marker is small, its impact on classification is less important. Thus, a new diagnostic/classification measure based on a modified area under the ROC curve (mAUC) is proposed, which is defined as a weighted sum of two AUCs, where the AUC with the smaller difference is assigned a lower weight, and vice versa. Using mAUC is robust in the sense that mAUC gets larger as AUC gets larger as long as they are not equal. Moreover, in many diagnostic situations, only a specific range of specificity is of interest. Under normal distributions, we show that if the AUCs of two markers are within similar ranges, the larger mAUC implies the larger partial AUC for a given specificity. This property of mAUC will help to identify the marker with the higher partial AUC, even when the AUCs are similar. Two nonparametric estimates of an mAUC and their variances are given. We also suggest the use of mAUC as the objective function for classification, and the use of the gradient Lasso algorithm for classifier construction and marker selection. Application to simulation datasets and real microarray gene expression datasets show that our method finds a linear classifier with a higher ROC curve than some other existing linear classifiers, especially in the range of low false positive rates.  相似文献   

17.
Using the techniques developed by Subrahmaniam and Ching’anda (1978), we study the robustness to nonnormality of the linear discriminant functions. It is seen that the LDF procedure is quite robust against the likelihood ratio rule. The latter yields in all cases much smaller overall error rates; however, the disparity between the error rates of the LDF and LR procedures is not large enough to warrant the recommendation to use the more complicated LR procedure.  相似文献   

18.
We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high-dimensional setting where p ? n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule obtained from LDA, since it involves all p features. We propose penalized LDA, a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach in order to efficiently optimize it when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L(1) and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high-dimensional setting, and explore their relationships with our proposal.  相似文献   

19.
In this paper, we discuss the problem of constructing designs in order to maximize the accuracy of nonparametric curve estimation in the possible presence of heteroscedastic errors. Our approach is to exploit the flexibility of wavelet approximations to approximate the unknown response curve by its wavelet expansion thereby eliminating the mathematical difficulty associated with the unknown structure. It is expected that only finitely many parameters in the resulting wavelet response can be estimated by weighted least squares. The bias arising from this, compounds the natural variation of the estimates. Robust minimax designs and weights are then constructed to minimize mean-squared-error-based loss functions of the estimates. We find the periodic and symmetric properties of the Euclidean norm of the multiwavelet system useful in eliminating some of the mathematical difficulties involved. These properties lead us to restrict the search for robust minimax designs to a specific class of symmetric designs. We also construct minimum variance unbiased designs and weights which minimize the loss functions subject to a side condition of unbiasedness. We discuss an example from the nonparametric literature.  相似文献   

20.
A classifier is developed which uses information from all pixels in a neighbourhood to classify the pixel at the center of the neighbourhood. It is not a smoother in that it tries to recognize boundaries. and it makes explieite use of the relative positions of pixels in the neighbourhood. It is based on a geometric probability model for the distribution of the classes in the plane. The neighbourhood-based classifier is shown to outperform linear discriminant analysis on some LANDSAT data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号