首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We propose new multivariate control charts that can effectively deal with massive amounts of complex data through their integration with classification algorithms. We call the proposed control chart the ‘Probability of Class (PoC) chart’ because the values of PoC, obtained from classification algorithms, are used as monitoring statistics. The control limits of PoC charts are established and adjusted by the bootstrap method. Experimental results with simulated and real data showed that PoC charts outperform Hotelling's T 2 control charts. Further, a simulation study revealed that a small proportion of out-of-control observations are sufficient for PoC charts to achieve the desired performance.  相似文献   

2.
Bagging and Boosting are two main ensemble approaches consolidating the decisions of several hypotheses. The diversity of the ensemble members is considered to be a significant element to obtain generalization error. Here, an inventive method called EBAGTS (ensemble-based artificially generated training samples) is proposed to generate ensembles. It manipulates training examples in three ways in order to build various hypotheses straightforwardly: drawing a sub-sample from training set, reducing/raising error-prone training instances, and reducing/raising local instances around error-prone regions. The proposed method is a straightforward, generic framework utilizing any base classifier as its ensemble members to assemble a powerfully built combinational classifier. Decision-tree classifier and multilayer perceptron classifier as some basic classifiers have been employed in the experiments to indicate the proposed method accomplish higher predictive accuracy compared to meta-learning algorithms like Boosting and Bagging. Furthermore, EBAGTS outperforms Boosting more impressively as the training data set gets broader. It is illustrated that EBAGTS can fulfill better performance comparing to the state of the art.  相似文献   

3.
A model-based classification technique is developed, based on mixtures of multivariate t-factor analyzers. Specifically, two related mixture models are developed and their classification efficacy studied. An AECM algorithm is used for parameter estimation, and convergence of these algorithms is determined using Aitken's acceleration. Two different techniques are proposed for model selection: the BIC and the ICL. Our classification technique is applied to data on red wine samples from Italy and to fatty acid measurements on Italian olive oils. These results are discussed and compared to more established classification techniques; under this comparison, our mixture models give excellent classification performance.  相似文献   

4.
数据流分类中的概念漂移问题是数据挖掘技术领域的前沿和难点,其重点是等级分类可能随着数据序列的转移而产生漂移现象。虽然估计动态漂移及其调整分类的算法已被提出,但现有算法由于目标分布例证的缺失在概念漂移估计方面的表现并不是很好,例证的多少严重影响了估计效果。鉴此,提出了一种新的参数估计方法,称为转移估计法,运用目标分布数据,结合相似分布理论,对现存的算法进行改进,以便实现对数据流分类中的概念漂移现象进行正确检测和估计。通过对虚拟和真实数据集的仿真实验表明,改进算法在数据流分类中的概念漂移估计方面优于现存算法。  相似文献   

5.
This paper presents a novel ensemble classifier generation method by integrating the ideas of bootstrap aggregation and Principal Component Analysis (PCA). To create each individual member of an ensemble classifier, PCA is applied to every out-of-bag sample and the computed coefficients of all principal components are stored, and then the principal components calculated on the corresponding bootstrap sample are taken as additional elements of the original feature set. A classifier is trained with the bootstrap sample and some features randomly selected from the new feature set. The final ensemble classifier is constructed by majority voting of the trained base classifiers. The results obtained by empirical experiments and statistical tests demonstrate that the proposed method performs better than or as well as several other ensemble methods on some benchmark data sets publicly available from the UCI repository. Furthermore, the diversity-accuracy patterns of the ensemble classifiers are investigated by kappa-error diagrams.  相似文献   

6.
This paper discusses a supervised classification approach for the differential diagnosis of Raynaud's phenomenon (RP). The classification of data from healthy subjects and from patients suffering for primary and secondary RP is obtained by means of a set of classifiers derived within the framework of linear discriminant analysis. A set of functional variables and shape measures extracted from rewarming/reperfusion curves are proposed as discriminant features. Since the prediction of group membership is based on a large number of these features, the high dimension/small sample size problem is considered to overcome the singularity problem of the within-group covariance matrix. Results on a data set of 72 subjects demonstrate that a satisfactory classification of the subjects can be achieved through the proposed methodology.  相似文献   

7.
Bayesian analysis of dynamic magnetic resonance breast images   总被引:2,自引:0,他引:2  
Summary.  We describe an integrated methodology for analysing dynamic magnetic resonance images of the breast. The problems that motivate this methodology arise from a collaborative study with a tumour institute. The methods are developed within the Bayesian framework and comprise image restoration and classification steps. Two different approaches are proposed for the restoration. Bayesian inference is performed by means of Markov chain Monte Carlo algorithms. We make use of a Metropolis algorithm with a specially chosen proposal distribution that performs better than more commonly used proposals. The classification step is based on a few attribute images yielded by the restoration step that describe the essential features of the contrast agent variation over time. Procedures for hyperparameter estimation are provided, so making our method automatic. The results show the potential of the methodology to extract useful information from acquired dynamic magnetic resonance imaging data about tumour morphology and internal pathophysiological features.  相似文献   

8.
SubBag is a technique by combining bagging and random subspace methods to generate ensemble classifiers with good generalization capability. In practice, a hyperparameter K of SubBag—the number of randomly selected features to create each base classifier—should be specified beforehand. In this article, we propose to employ the out-of-bag instances to determine the optimal value of K in SubBag. The experiments conducted with some UCI real-world data sets show that the proposed method can make SubBag achieve the optimal performance in nearly all the considered cases. Meanwhile, it occupied less computational sources than cross validation procedure.  相似文献   

9.
As no single classification method outperforms other classification methods under all circumstances, decision-makers may solve a classification problem using several classification methods and examine their performance for classification purposes in the learning set. Based on this performance, better classification methods might be adopted and poor methods might be avoided. However, which single classification method is the best to predict the classification of new observations is still not clear, especially when some methods offer similar classification performance in the learning set. In this article we present various regression and classical methods, which combine several classification methods to predict the classification of new observations. The quality of the combined classifiers is examined on some real data. Nonparametric regression is the best method of combining classifiers.  相似文献   

10.
Methods are proposed to combine several individual classifiers in order to develop more accurate classification rules. The proposed algorithm uses Rademacher–Walsh polynomials to combine M (≥2) individual classifiers in a nonlinear way. The resulting classifier is optimal in the sense that its misclassification error rate is always less than, or equal to, that of each constituent classifier. A number of numerical examples (based on both real and simulated data) are also given. These examples demonstrate some new, and far-reaching, benefits of working with combined classifiers.  相似文献   

11.
In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al. [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets.  相似文献   

12.
Fundamental frequency (F0) patterns, which indicate the vibration frequency of vocal cords, reflect the developmental changes in infant spoken language. In previous studies of developmental psychology, however, F0 patterns were manually classified into subjectively specified categories. Furthermore, since F0 has sequential missing and indicates a mean nonstationarity, classification that employs subsequent partition and conventional discriminant analysis based on stationary and local stationary processes is considered inadequate. Consequently, we propose a classification method based on discriminant analysis of time series data with mean nonstationarity and sequential missing, and a measurement technique for investigating the configuration similarities for classification. Using our proposed procedures, we analyse a longitudinal database of recorded conversations between infants and parents over a five-year period. Various F0 patterns were automatically classified into appropriate pattern groups, and the classification similarities calculated. These similarities gradually decreased with infant’s monthly age until a large change occurred around 20 months. The results suggest that our proposed methods are useful for analysing large-scale data and can contribute to studies of infant spoken language acquisition.  相似文献   

13.
Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees.  相似文献   

14.
Testing for periodicity in microarray time series encounters the challenges of short series length, missing values and presence of non-Fourier frequencies. In this article, a test method for such series has been proposed. The method is completely simulation based and finds p-values for test of periodicity through fitting Pearson Type VI distribution. The simulation results compare and reveal the excellence of this method over Fisher's g test for varying series length, frequencies, and error variance. This approach is applied to Caulobacter crescentus cell cycle data in order to demonstrate the practical performance.  相似文献   

15.
选择性集成算法是目前机器学习关注的热点之一。在对一海藻繁殖案例研究的基础上,提出了一种基于k—nleanS聚类技术的快速选择性BaggingTre咚集成算法;同时与传统统计方法和一些常用的机器学习方法相比较,发现该算法具有较小的模型推广误差和更高的预测精度的优点,而且其运行的效率也得到了较大的提高。  相似文献   

16.
Recently, a new ensemble classification method named Canonical Forest (CF) has been proposed by Chen et al. [Canonical forest. Comput Stat. 2014;29:849–867]. CF has been proven to give consistently good results in many data sets and comparable to other widely used classification ensemble methods. However, CF requires an adopting feature reduction method before classifying high-dimensional data. Here, we extend CF to a high-dimensional classifier by incorporating a random feature subspace algorithm [Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–844]. This extended algorithm is called HDCF (high-dimensional CF) as it is specifically designed for high-dimensional data. We conducted an experiment using three data sets – gene imprinting, oestrogen, and leukaemia – to compare the performance of HDCF with several popular and successful classification methods on high-dimensional data sets, including Random Forest [Breiman L. Random forest. Mach Learn. 2001;45:5–32], CERP [Ahn H, et al. Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal. 2007;51:6166–6179], and support vector machines [Vapnik V. The nature of statistical learning theory. New York: Springer; 1995]. Besides the classification accuracy, we also investigated the balance between sensitivity and specificity for all these four classification methods.  相似文献   

17.
For the sign testing problem about the normal variances, we develop the heuristic testing procedure based on the concept of generalized test variable and generalized p-value. A detailed simulation study is conducted to empirically investigate the performance of the proposed method. Through the simulation study, especially in small sample sizes, the proposed test not only adequately controls empirical size at the nominal level, but also uniformly more powerful than likelihood ratio test, Gutmann's test, Li and Sinha's test and Liu and Chan's test, showing that the proposed method can be recommended in practice. The proposed method is illustrated with the published data.  相似文献   

18.
Most classification models have presented an imbalanced learning state when dealing with the imbalanced datasets. This article proposes a novel approach for learning from imbalanced datasets, which based on an improved SMOTE (synthetic Minority Over-sampling technique) algorithm. By organically combining the over-sampling and the under-sampling method, this approach aims to choose neighbors targetedly and synthesize samples with different strategy. Experiments show that most classifiers have achieved an ideal performance on the classification problem of the positive and negative class after dealing imbalanced datasets with our algorithm.  相似文献   

19.
The Bayes classification rule offers the optimal classifier, minimizing the classification error rate, whereas the Neyman–Pearson lemma offers the optimal family of classifiers to maximize the detection rate for any given false alarm rate. These motivate studies on comparing classifiers based on similarities between the classifiers and the optimal. In this article, we define partial order relations on classifiers and families of classifiers, based on rankings of rate function values and rankings of test function values, respectively. Each partial order relation provides a sufficient condition, which yields better classification error rates or better performance on the receiver operating characteristic analysis. Various examples and applications of the partial order theorems are discussed to provide comparisons of classifiers and families of classifiers, including the comparison of cross-validation methods, training data that contains outliers, and labelling errors in training data. The Canadian Journal of Statistics 48: 152–166; 2020 © 2019 Statistical Society of Canada  相似文献   

20.
This paper proposes an algorithm for the classification of multi-dimensional datasets based on the conjugate Bayesian Multiple Kernel Grouping Learning (BMKGL). Using conjugate Bayesian framework improves the computation efficiency. Multiple kernels instead of a single kernel avoid the kernel selection problem which is also a computationally expensive work. Through grouping parameter learning, BMKGL can simultaneously integrate information from different dimensions and find the dimensions which contribute more to the variations of the outcome for the purpose of interpretable property. Meanwhile, BMKGL can select the most suitable combination of kernels for different dimensions so as to extract the most appropriate measure for each dimension and improve the accuracy of classification results. The simulation results illustrate that our learning process has better performance in prediction results and stability compared to some popular classifiers, such as k-nearest neighbours algorithm, support vector machine algorithm and naive Bayes classifier. BMKGL also outperforms previous methods in terms of accuracy and interpretation for the heart disease and EEG datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号