首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 46 毫秒
1.
文章介绍了基于高斯混合模型的期望最大化聚类算法,并对模型进行了简化,运用案例分析了该模型在经济管理领域中的应用,利用可视化的图形展示了研究样本的概率密度.  相似文献   

2.
对于一类变量非线性相关的面板数据,现有的基于线性算法的面板数据聚类方法并不能准确地度量样本间的相似性,且聚类结果的可解释性低。综合考虑变量非线性相关问题及聚类结果可解释性问题,提出一种非线性面板数据的聚类方法,通过非线性核主成分算法实现对样本相似性的测度,并基于混合高斯模型进行样本概率聚类,实证表明该方法的有效性及其对聚类结果的可解释性有所提高。  相似文献   

3.
文章针对现代工程项目投资大、建设周期长、技术复杂等特点,构建了大型工程项目的风险评价指标体系,在此基础上运用网络分析法分析了指标体系之间的相互影响和作用得到指标权重,进而运用灰色聚类分析法将分散的风险评价信息处理成不同灰类度的评价量并得到风险综合评价值,最后通过实例说明该评价方法的有效性和科学性。  相似文献   

4.
Dirichlet过程作为一种典型的变参数贝叶斯模型,基于该过程进行的聚类分析无需预先确定聚类数,聚类数作为模型中的参数由模型和数据自主计算得出,因而成为机器学习研究领域中的一个研究热点,可用于海量数据的聚类分析。文章建立Dirichlet过程无限混合模型对DNA基因表达数据展开了聚类分析。模拟测试数据集和急性白血病的DNA基因表达测试数据集的实验结果表明,Dirichlet过程无限混合模型能够准确地估计出数据中的聚类数。  相似文献   

5.
传统的K-Prototypes聚类算法是利用划分的思想来对混合数据进行聚类,但是当混合数据的维度增大时,对象之间的差异度几乎相等,使得此算法难以进行。针对上述缺陷,文章提出一种改进的K-Prototyes聚类算法,聚类前先剔除各类中不相关的维度,将高维混合数据投影降维后再进行聚类。文中给出了Heart Disease Databases的算例,验证了算法的有效性。  相似文献   

6.
对两阶聚类法自动确定聚类数规则的求证   总被引:1,自引:0,他引:1  
文章从两阶聚类法(TwoStep Cluster TSC)的原理出发,以联合对数极大似然估计值为基础,论证了BIC、BIC变化、BIC变化率、距离变化率等核心指标的计算公式,进而系统阐述了TSC分两步确定聚类数的规则,并通过实例对规则进行了演示.  相似文献   

7.
传统的解决有序样本聚类的Fisher最优分割法对计算机存储能力要求较高,不适合由于样本长度较大时的情况.实践中常用的最优二分割法只能求得局部最优解.文章提出了一种基于遗传算法解决有序样拳聚类问题的新算法.该算法适用于多种聚类距离,适合于大样本,可以解决方向聚类问题.  相似文献   

8.
本文首先介绍利用变量聚类过程VARCLUS构造的类变量综合得分的方法,然后通过一个具体实例说明类变量综合得分在多指标(变量)系统的排序评估问题中的应用.  相似文献   

9.
目前国内外各种聚类算法数以千百计,本文提出了一个基于聚类算法构成要素的分类框架,进行了文献综述,并指出了四个研究热点。  相似文献   

10.
基于遗传算法的投影寻踪聚类   总被引:2,自引:0,他引:2  
传统的投影寻踪聚类算法PROCLUS是一种有效的处理高维数据聚类的算法,但此算法是利用爬山法(Hill climbing)对各类中心点进行循环迭代、选取最优的过程,由于爬山法是一种局部搜索(local search)方法,得到的最优解可能仅仅是局部最优。针对上述缺陷,提出一种改进的投影寻踪聚类算法,即利用遗传算法(Genetic Algorithm)对各类中心点进行循环迭代,寻找到全局最优解。仿真实验结果证明了新算法的可行性和有效性。  相似文献   

11.
基于改进的自适应传播模型的农业风险区划分析   总被引:1,自引:0,他引:1  
农业险定价中的核心问题是农业风险区划问题,为了体现农业区划中个体指标的动态发展特征,根据近邻传播改进自适应近邻传播聚类方法对数据进行优化,基于轮廓系数、归属度和吸引度得到最佳聚类中心和几何聚类中心,并将聚类转化为新数据集的聚类问题;选取代表性的棉花为例进行实证分析,通过计算生产、销售、收入、财政等指标进行棉花风险区划实例分析,计算最优棉花风险区划,结果表明对于具有动态特征的数据,本模型具有很好的有效性、实用性和解释性。  相似文献   

12.
It is well known that adaptive sequential nonparametric estimation of differentiable functions with assigned mean integrated squared error and minimax expected stopping time is impossible. In other words, no sequential estimator can compete with an oracle estimator that knows how many derivatives an estimated curve has. Differentiable functions are typical in probability density and regression models but not in spectral density models, where considered functions are typically smoother. This paper shows that for a large class of spectral densities, which includes spectral densities of classical autoregressive moving average processes, an adaptive minimax sequential estimation with assigned mean integrated squared error is possible. Furthermore, a two‐stage sequential procedure is proposed, which is minimax and adaptive to smoothness of an underlying spectral density.  相似文献   

13.
Consider a Gaussian random field model on , observed on a rectangular region. Suppose it is desired to estimate a set of parameters in the covariance function. Spectral and circulant approximations to the likelihood are often used to facilitate estimation of the parameters. The purpose of the paper is to give a careful treatment of the quality of these approximations. A spectral approximation for the likelihood was given by Guyon (Biometrika 69 (1982) 95–105) but without proof. The results given here generalize those of Guyon, and fill in the details of the proof. In addition some matrix results are derived which may be of independent interest. Applications are made to Fisher information and bias calculations for maximum likelihood estimates.  相似文献   

14.
For time series data with obvious periodicity (e.g., electric motor systems and cardiac monitor) or vague periodicity (e.g., earthquake and explosion, speech, and stock data), frequency-based techniques using the spectral analysis can usually capture the features of the series. By this approach, we are able not only to reduce the data dimensions into frequency domain but also utilize these frequencies by general classification methods such as linear discriminant analysis (LDA) and k-nearest-neighbor (KNN) to classify the time series. This is a combination of two classical approaches. However, there is a difficulty in using LDA and KNN in frequency domain due to excessive dimensions of data. We overcome the obstacle by using Singular Value Decomposition to select essential frequencies. Two data sets are used to illustrate our approach. The classification error rates of our simple approach are comparable to those of several more complicated methods.  相似文献   

15.
高海燕等 《统计研究》2020,37(8):91-103
函数型聚类分析算法涉及投影和聚类两个基本要素。通常,最优投影结果未必能够有效地保留类别信息,从而影响后续聚类效果。为此,本文梳理了函数型聚类的构成要素及运行过程;借助非负矩阵分解的聚类特性,提出了基于非负矩阵分解的函数型聚类算法,构建了“投影与聚类”并行的实现框架,并采用交替迭代方法更新求解,分析了算法的计算时间复杂度。针对随机模拟数据验证和语音识别数据的实例检验结果显示,该函数型聚类算法有助于提高聚类效果;针对北京市二氧化氮(NO2)污染物小时浓度数据的实例应用表明,该函数型聚类算法对空气质量监测点类型的区分能够充分识别站点布局的空间模式,具有良好的实际应用价值。  相似文献   

16.
The circulant embedding method for generating statistically exact simulations of time series from certain Gaussian distributed stationary processes is attractive because of its advantage in computational speed over a competitive method based upon the modified Cholesky decomposition. We demonstrate that the circulant embedding method can be used to generate simulations from stationary processes whose spectral density functions are dictated by a number of popular nonparametric estimators, including all direct spectral estimators (a special case being the periodogram), certain lag window spectral estimators, all forms of Welch's overlapped segment averaging spectral estimator and all basic multitaper spectral estimators. One application for this technique is to generate time series for bootstrapping various statistics. When used with bootstrapping, our proposed technique avoids some – but not all – of the pitfalls of previously proposed frequency domain methods for simulating time series.  相似文献   

17.
The customary approach to spatial data modeling in the presence of censored data, is to assume the underlying random field is Gaussian. However, in practice, we often faced data that the exploratory data analysis shows the skewness and consequently, it violates the normality assumption. In such setting, the skew Gaussian (SG) spatial model is used to overcome this issue. In this article, the SG model is fitted based on censored observations. For this purpose, we adopt the Bayesian approach and utilize the Markov chain Monte Carlo algorithms and data augmentations to carry out calculations. A numerical example illustrates the methodology.  相似文献   

18.
研究了一种附有引力影响因子的半监督K-means核函数聚类算法,并将该方法应用于多因子选股模型中。研究表明,相比传统的聚类模型,改进的模型具有较强的泛化能力,模型在处理样本线性不可分、样本分布非球状簇等问题上具有明显的优势,能选出较优的股票组合。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号