首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we describe some results of an ESPRIT project known as StatLog whose purpose is the comparison of classification algorithms. We give a brief summary of some of the algorithms in the project: discriminant analysis; nearest neighbours; decision trees; neural net methods; SMART; kernel methods and other Bayesian approaches.We focus on data sets derived from images, ranging from raw pixel data to features and summaries extracted from such data.  相似文献   

2.
In this paper, we describe some results of an ESPRIT project known as StatLog whose purpose is the comparison of classification algorithms. We give a brief summary of some of the algorithms in the project: discriminant analysis; nearest neighbours; decision trees; neural net methods; SMART; kernel methods and other Bayesian approaches.We focus on data sets derived from images, ranging from raw pixel data to features and summaries extracted from such data.  相似文献   

3.
Kernel discriminant analysis translates the original classification problem into feature space and solves the problem with dimension and sample size interchanged. In high‐dimension low sample size (HDLSS) settings, this reduces the ‘dimension’ to that of the sample size. For HDLSS two‐class problems we modify Mika's kernel Fisher discriminant function which – in general – remains ill‐posed even in a kernel setting; see Mika et al. (1999). We propose a kernel naive Bayes discriminant function and its smoothed version, using first‐ and second‐degree polynomial kernels. For fixed sample size and increasing dimension, we present asymptotic expressions for the kernel discriminant functions, discriminant directions and for the error probability of our kernel discriminant functions. The theoretical calculations are complemented by simulations which show the convergence of the estimators to the population quantities as the dimension grows. We illustrate the performance of the new discriminant rules, which are easy to implement, on real HDLSS data. For such data, our results clearly demonstrate the superior performance of the new discriminant rules, and especially their smoothed versions, over Mika's kernel Fisher version, and typically also over the commonly used naive Bayes discriminant rule.  相似文献   

4.
The construction of kernel discriminant coordinates reduces to the solution of a generalized eigenvalue problem in which both matrices are nonnegative definite. Six different algorithms for solving that problem are described, and the performance of these algorithms is tested on 26 different datasets. The percentage of misclassifications using a linear discriminant function is noted, and the algorithms’ running times are ascertained. Classification is also performed in the space of classical discriminant coordinates.  相似文献   

5.
The purpose of this paper is to examine the multiple group (>2) discrimination problem in which the group sizes are unequal and the variables used in the classification are correlated with skewed distributions. Using statistical simulation based on data from a clinical study, we compare the performances, in terms of misclassification rates, of nine statistical discrimination methods. These methods are linear and quadratic discriminant analysis applied to untransformed data, rank transformed data, and inverse normal scores data, as well as fixed kernel discriminant analysis, variable kernel discriminant analysis, and variable kernel discriminant analysis applied to inverse normal scores data. It is found that the parametric methods with transformed data generally outperform the other methods, and the parametric methods applied to inverse normal scores usually outperform the parametric methods applied to rank transformed data. Although the kernel methods often have very biased estimates, the variable kernel method applied to inverse normal scores data provides considerable improvement in terms of total nonerror rate.  相似文献   

6.
We introduce a technique for extending the classical method of linear discriminant analysis (LDA) to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis ( FLDA ), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one- or two-dimensional pictorial representation of a set of curves. We also extend this procedure to provide generalizations of quadratic and regularized discriminant analysis.  相似文献   

7.
Measures of association between two sets of random variables have long been of interest to statisticians. The classical canonical correlation analysis (LCCA) can characterize, but also is limited to, linear association. This article introduces a nonlinear and nonparametric kernel method for association study and proposes a new independence test for two sets of variables. This nonlinear kernel canonical correlation analysis (KCCA) can also be applied to the nonlinear discriminant analysis. Implementation issues are discussed. We place the implementation of KCCA in the framework of classical LCCA via a sequence of independent systems in the kernel associated Hilbert spaces. Such a placement provides an easy way to carry out the KCCA. Numerical experiments and comparison with other nonparametric methods are presented.  相似文献   

8.
We study the influence of a single data case on the results of a statistical analysis. This problem has been addressed in several articles for linear discriminant analysis (LDA). Kernel Fisher discriminant analysis (KFDA) is a kernel based extension of LDA. In this article, we study the effect of atypical data points on KFDA and develop criteria for identification of cases having a detrimental effect on the classification performance of the KFDA classifier. We find that the criteria are successful in identifying cases whose omission from the training data prior to obtaining the KFDA classifier results in reduced error rates.  相似文献   

9.
10.
生活质量的综合评价:基于数据函数性特征的方法   总被引:1,自引:1,他引:0  
生活质量的评价与分析是生活质量研究中的核心问题,现有的对生活质量进行综合评价的方法,共同具有的缺憾是方法所处理的数据要么是横截面数据,要么是时间序列数据。然而,实际中可获得的样本资料往往是横截面数据与时间序列数据融合在一起的函数性数据。为弥补现有方法的缺陷,基于数据的函数性特征,引入一种对生活质量进行综合评价和分析的全新方法。从数据信息利用的程度来看,其方法明显优于现有的方法。  相似文献   

11.
This work focuses on the linear regression model with functional covariate and scalar response. We compare the performance of two (parametric) linear regression estimators and a nonparametric (kernel) estimator via a Monte Carlo simulation study and the analysis of two real data sets. The first linear estimator expands the predictor and the regression weight function in terms of the trigonometric basis, while the second one uses functional principal components. The choice of the regularization degree in the linear estimators is addressed.  相似文献   

12.
We consider the problem related to clustering of gamma-ray bursts (from “BATSE” catalogue) through kernel principal component analysis in which our proposed kernel outperforms results of other competent kernels in terms of clustering accuracy and we obtain three physically interpretable groups of gamma-ray bursts. The effectivity of the suggested kernel in combination with kernel principal component analysis in revealing natural clusters in noisy and nonlinear data while reducing the dimension of the data is also explored in two simulated data sets.  相似文献   

13.
We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high-dimensional setting where p ? n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule obtained from LDA, since it involves all p features. We propose penalized LDA, a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach in order to efficiently optimize it when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L(1) and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high-dimensional setting, and explore their relationships with our proposal.  相似文献   

14.
This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples.  相似文献   

15.
A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data.  相似文献   

16.
We propose a method for the analysis of a spatial point pattern, which is assumed to arise as a set of observations from a spatial nonhomogeneous Poisson process. The spatial point pattern is observed in a bounded region, which, for most applications, is taken to be a rectangle in the space where the process is defined. The method is based on modeling a density function, defined on this bounded region, that is directly related with the intensity function of the Poisson process. We develop a flexible nonparametric mixture model for this density using a bivariate Beta distribution for the mixture kernel and a Dirichlet process prior for the mixing distribution. Using posterior simulation methods, we obtain full inference for the intensity function and any other functional of the process that might be of interest. We discuss applications to problems where inference for clustering in the spatial point pattern is of interest. Moreover, we consider applications of the methodology to extreme value analysis problems. We illustrate the modeling approach with three previously published data sets. Two of the data sets are from forestry and consist of locations of trees. The third data set consists of extremes from the Dow Jones index over a period of 1303 days.  相似文献   

17.
In order to explore and compare a finite number T of data sets by applying functional principal component analysis (FPCA) to the T associated probability density functions, we estimate these density functions by using the multivariate kernel method. The data set sizes being fixed, we study the behaviour of this FPCA under the assumption that all the bandwidth matrices used in the estimation of densities are proportional to a common parameter h and proportional to either the variance matrices or the identity matrix. In this context, we propose a selection criterion of the parameter h which depends only on the data and the FPCA method. Then, on simulated examples, we compare the quality of approximation of the FPCA when the bandwidth matrices are selected using either the previous criterion or two other classical bandwidth selection methods, that is, a plug-in or a cross-validation method.  相似文献   

18.
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

19.
The problem of constructing classification methods based on both labeled and unlabeled data sets is considered for analyzing data with complex structures. We introduce a semi-supervised logistic discriminant model with Gaussian basis expansions. Unknown parameters included in the logistic model are estimated by regularization method along with the technique of EM algorithm. For selection of adjusted parameters, we derive a model selection criterion from Bayesian viewpoints. Numerical studies are conducted to investigate the effectiveness of our proposed modeling procedures.  相似文献   

20.
In this article, a variable selection procedure, called surrogate selection, is proposed which can be applied when a support vector machine or kernel Fisher discriminant analysis is used in a binary classification problem. Surrogate selection applies the lasso after substituting the kernel discriminant scores for the binary group labels, as well as values for the input variable observations. Empirical results are reported, showing that surrogate selection performs well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号