首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This article modifies two internal validity measures and applies them to evaluate the quality of clustering for probability density functions (pdfs). Based on these measures, we propose a new modified genetic algorithm called GA-CDF to establish the suitable clusters for pdfs. The proposed algorithm is tested by four numerical examples including two synthetic data sets and two real data sets. These examples illustrate the superiority of proposed algorithm over some existing algorithms in evaluating the internal or external validity measures. It demonstrates the feasibility and applicability of the GA-CDF for practical problems in data mining.  相似文献   

2.
In this article, we propose a new criterion to evaluate the similarity of probability density functions (pdfs). We call this the criterion on similar coefficient of cluster (SCC) and use it as a tool to deal with overlap coefficients of pdfs in normal standard on [0;1]. With the support of the self-update algorithm for determining the suitable number of clusters, SCC then becomes a criterion to establish the corresponding cluster for pdfs. Moreover, some results on determination of SCC in case of two and more than two pdfs as well as relations of different SCCs and other measures are presented. The numerical examples in both synthetic data and real data are given not only to illustrate the suitability of proposed theories and algorithms but also to demonstrate the applicability and innovation of the proposed algorithm.  相似文献   

3.
This study develops a robust automatic algorithm for clustering probability density functions based on the previous research. Unlike other existing methods that often pre-determine the number of clusters, this method can self-organize data groups based on the original data structure. The proposed clustering method is also robust in regards to noise. Three examples of synthetic data and a real-world COREL dataset are utilized to illustrate the accurateness and effectiveness of the proposed approach.  相似文献   

4.
In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.  相似文献   

5.
数据分布密度划分的聚类算法是数据挖掘聚类算法的主要方法之一。针对传统密度划分聚类算法存在运算复杂、运行效率不高等缺陷,设计高维分步投影的多重分区聚类算法;以高维分布投影密度为依据,对数据集进行多重分区,产生数据集的子簇空间,并进行子簇合并,形成理想的聚类结果;依据该算法进行实验,结果证明该算法具有运算简单和运行效率高等优良性。  相似文献   

6.
高海燕等 《统计研究》2020,37(8):91-103
函数型聚类分析算法涉及投影和聚类两个基本要素。通常,最优投影结果未必能够有效地保留类别信息,从而影响后续聚类效果。为此,本文梳理了函数型聚类的构成要素及运行过程;借助非负矩阵分解的聚类特性,提出了基于非负矩阵分解的函数型聚类算法,构建了“投影与聚类”并行的实现框架,并采用交替迭代方法更新求解,分析了算法的计算时间复杂度。针对随机模拟数据验证和语音识别数据的实例检验结果显示,该函数型聚类算法有助于提高聚类效果;针对北京市二氧化氮(NO2)污染物小时浓度数据的实例应用表明,该函数型聚类算法对空气质量监测点类型的区分能够充分识别站点布局的空间模式,具有良好的实际应用价值。  相似文献   

7.
马少沛等 《统计研究》2021,38(2):114-134
在大数据时代,金融学、基因组学和图像处理等领域产生了大量的张量数据。Zhong等(2015)提出了张量充分降维方法,并给出了处理二阶张量的序列迭代算法。鉴于高阶张量在实际生活中的广泛应用,本文将Zhong等(2015)的算法推广到高阶,以三阶张量为例,提出了两种不同的算法:结构转换算法和结构保持算法。两种算法都能够在不同程度上保持张量原有结构信息,同时有效降低变量维度和计算复杂度,避免协方差矩阵奇异的问题。将两种算法应用于人像彩图的分类识别,以二维和三维点图等形式直观展现了算法分类结果。将本文的结构保持算法与K-means聚类方法、t-SNE非线性降维方法、多维主成分分析、多维判别分析和张量切片逆回归共五种方法进行对比,结果表明本文所提方法在分类精度方面有明显优势,因此在图像识别及相关应用领域具有广阔的发展前景。  相似文献   

8.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

9.
In this paper, we present a new algorithm for clustering proximity-relation matrix that does not require the transitivity property. The proposed algorithm is first inspired by the idea of Yang and Wu [16] then turned into a self-organizing process that is built upon the intuition behind clustering. At the end of the process subjects belonging to be the same cluster should converge to the same point, which represents the cluster center. However, the performance of Yang and Wu's algorithm depends on parameter selection. In this paper, we use the partition entropy (PE) index to choose it. Numerical result illustrates that the proposed method does not only solve the parameter selection problem but also obtains an optimal clustering result. Finally, we apply the proposed algorithm to three applications. One is to evaluate the performance of higher education in Taiwan, another is machine–parts grouping in cellular manufacturing systems, and the other is to cluster probability density functions.  相似文献   

10.
The EM algorithm is the standard method for estimating the parameters in finite mixture models. Yang and Pan [25] proposed a generalized classification maximum likelihood procedure, called the fuzzy c-directions (FCD) clustering algorithm, for estimating the parameters in mixtures of von Mises distributions. Two main drawbacks of the EM algorithm are its slow convergence and the dependence of the solution on the initial value used. The choice of initial values is of great importance in the algorithm-based literature as it can heavily influence the speed of convergence of the algorithm and its ability to locate the global maximum. On the other hand, the algorithmic frameworks of EM and FCD are closely related. Therefore, the drawbacks of FCD are the same as those of the EM algorithm. To resolve these problems, this paper proposes another clustering algorithm, which can self-organize local optimal cluster numbers without using cluster validity functions. These numerical results clearly indicate that the proposed algorithm is superior in performance of EM and FCD algorithms. Finally, we apply the proposed algorithm to two real data sets.  相似文献   

11.
Effectively solving the label switching problem is critical for both Bayesian and Frequentist mixture model analyses. In this article, a new relabeling method is proposed by extending a recently developed modal clustering algorithm. First, the posterior distribution is estimated by a kernel density from permuted MCMC or bootstrap samples of parameters. Second, a modal EM algorithm is used to find the m! symmetric modes of the KDE. Finally, samples that ascend to the same mode are assigned the same label. Simulations and real data applications demonstrate that the new method provides more accurate estimates than many existing relabeling methods.  相似文献   

12.
Model-based clustering is a method that clusters data with an assumption of a statistical model structure. In this paper, we propose a novel model-based hierarchical clustering method for a finite statistical mixture model based on the Fisher distribution. The main foci of the proposed method are: (a) provide efficient solution to estimate the parameters of a Fisher mixture model (FMM); (b) generate a hierarchy of FMMs and (c) select the optimal model. To this aim, we develop a Bregman soft clustering method for FMM. Our model estimation strategy exploits Bregman divergence and hierarchical agglomerative clustering. Whereas, our model selection strategy comprises a parsimony-based approach and an evaluation graph-based approach. We empirically validate our proposed method by applying it on simulated data. Next, we apply the method on real data to perform depth image analysis. We demonstrate that the proposed clustering method can be used as a potential tool for unsupervised depth image analysis.  相似文献   

13.
We propose two probability-like measures of individual cluster-membership certainty that can be applied to a hard partition of the sample such as that obtained from the partitioning around medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual’s tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition using these measures. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft-clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher’s classic dataset on irises.  相似文献   

14.
Mixture model-based clustering is widely used in many applications. In certain real-time applications the rapid increase of data size with time makes classical clustering algorithms too slow. An online clustering algorithm based on mixture models is presented in the context of a real-time flaw-diagnosis application for pressurized containers which uses data from acoustic emission signals. The proposed algorithm is a stochastic gradient algorithm derived from the classification version of the EM algorithm (CEM). It provides a model-based generalization of the well-known online k-means algorithm, able to handle non-spherical clusters. Using synthetic and real data sets, the proposed algorithm is compared with the batch CEM algorithm and the online EM algorithm. The three approaches generate comparable solutions in terms of the resulting partition when clusters are relatively well separated, but online algorithms become faster as the size of the available observations increases.  相似文献   

15.
基于遗传算法的投影寻踪聚类   总被引:1,自引:0,他引:1  
传统的投影寻踪聚类算法PROCLUS是一种有效的处理高维数据聚类的算法,但此算法是利用爬山法(Hill climbing)对各类中心点进行循环迭代、选取最优的过程,由于爬山法是一种局部搜索(local search)方法,得到的最优解可能仅仅是局部最优。针对上述缺陷,提出一种改进的投影寻踪聚类算法,即利用遗传算法(Genetic Algorithm)对各类中心点进行循环迭代,寻找到全局最优解。仿真实验结果证明了新算法的可行性和有效性。  相似文献   

16.
Given that the Euclidean distance between the parameter estimates of autoregressive expansions of autoregressive moving average models can be used to classify stationary time series into groups, a test of hypothesis is proposed to determine whether two stationary series in a particular group have significantly different generating processes. Based on this test a new clustering algorithm is also proposed. The results of Monte Carlo simulations are given.  相似文献   

17.
The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples.  相似文献   

18.
An important problem in network analysis is to identify significant communities. Most of the real-world data sets exhibit a certain topological structure between nodes and the attributes describing them. In this paper, we propose a new community detection criterion considering both structural similarities and attribute similarities. The clustering method integrates the cost of clustering node attributes with the cost of clustering the structural information via the normalized modularity. We show that the joint clustering problem can be formulated as a spectral relaxation problem. The proposed algorithm is capable of learning the degree of contributions of individual node attributes. A number of numerical studies involving simulated and real data sets demonstrate the effectiveness of the proposed method.  相似文献   

19.
Detecting local spatial clusters for count data is an important task in spatial epidemiology. Two broad approaches—moving window and disease mapping methods—have been suggested in some of the literature to find clusters. However, the existing methods employ somewhat arbitrarily chosen tuning parameters, and the local clustering results are sensitive to the choices. In this paper, we propose a penalized likelihood method to overcome the limitations of existing local spatial clustering approaches for count data. We start with a Poisson regression model to accommodate any type of covariates, and formulate the clustering problem as a penalized likelihood estimation problem to find change points of intercepts in two-dimensional space. The cost of developing a new algorithm is minimized by modifying an existing least absolute shrinkage and selection operator algorithm. The computational details on the modifications are shown, and the proposed method is illustrated with Seoul tuberculosis data.  相似文献   

20.
A wavelet method is proposed for recovering damaged images. The proposed method combines wavelet shrinkage with preprocessing based on a binning process and an imputation procedure that is designed to extend the scope of wavelet shrinkage to data with missing values and perturbed locations. The proposed algorithm, termed as the BTW algorithm is simple to implement and efficient for recovering an image. Furthermore, this algorithm can be easily applied to wavelet regression for one-dimensional (1-D) signal estimation with irregularly spaced data. Simulation studies and real examples show that the proposed method can produce substantially effective results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号