首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 578 毫秒
1.
孙旭 《统计教育》2009,(3):55-59
本文指出了用点和点距离度量时间序列相似性存在的问题,并给出一种新的相似性度量一全局特征,即从时间序列的统计分布特征、非线性和傅立叶频谱转换等3个方面提取全局特征构建特征向量,并进行聚类分析。本文以全国各地区人均GDP时间序列聚类为例,评估了距离相似法与全局特征法的聚类结果。实践证实全局特征法不但可以处理不同长度有缺失值的时序聚类,而且可以降低大型时间序列数据聚类计算的复杂度。  相似文献   

2.
对于一类变量非线性相关的面板数据,现有的基于线性算法的面板数据聚类方法并不能准确地度量样本间的相似性,且聚类结果的可解释性低。综合考虑变量非线性相关问题及聚类结果可解释性问题,提出一种非线性面板数据的聚类方法,通过非线性核主成分算法实现对样本相似性的测度,并基于混合高斯模型进行样本概率聚类,实证表明该方法的有效性及其对聚类结果的可解释性有所提高。  相似文献   

3.
时间序列数据聚类在统计分析中具有重要意义。然而高维时间序列数据挖掘高度依赖的相似性搜索方法仍面临计算量大、准确率低等问题。为了提升高维时间序列数据挖掘任务的准确率和效率,提出一种基于波动特征的时间序列相似性搜索算法。该算法首先提出局部高频离散小波变换(LHFDWT)方法,通过合理的分解与重构来实现序列的降维;然后提出基于欧氏距离(ED)、波动幅度和秩相关系数从时间序列形态波动的相对偏差和趋势一致性角度计算相似度;最后提出一种相似性搜索算法和新的基于波动特征的时间序列聚类方法,并利用k-medoids聚类技术进行聚类分析。基于UCR标准时间序列数据集的实验结果表明,相对于动态时间规整(DTW)和最长公共子序列(LCSS)方法,所提新方法下的聚类准确率表现更优,置信度达到99%;在正确预测聚类数目和搜索效率方面具有更好的效果,且聚类结果具有更高的稳定性;1-NN分类准确率更高,说明其在确定更好的聚类中心方面效果更优,置信度至少为85%,证明了所提新方法的相似性搜索算法的优越性。  相似文献   

4.
文章针对股票市场的时间序列数据进行了时间序列相似性度量方法的研究,比较了目前各种度量方法的特点,提出了针对共同模式的相似性度量的方法,并选取了若干支股票收盘价数据对该方法的特点进行了考量。  相似文献   

5.
Box-Pierce Q检验采用近似卡方分布分析时间序列的平稳性特征,其检验统计量的参数选取将影响到检验结果.文章多个Q值提取平稳性特征,在此基础上建立新的平稳性判定准则,该准则是自相关函数序列收敛的充分条件;采用欧氏函数作为平稳性特征的相似性度量,借助k-means聚类建立平稳性分类方法;该方法在平稳性分析过程中充分考虑了样本之间的关联性,避免了传统Box-PierceQ检验对统计分布和临界表的过度依赖.实验结果表明,新方法能有效地处理海量时间序列数据,且准确率高于Q检验和ADF检验.  相似文献   

6.
研究面板数据聚类问题过程中,在相似性度量上,用Logistic回归模型构造相似系数和非对称相似矩阵。在聚类算法上,目前的聚类算法只适用于对称的相似矩阵。在非对称相似矩阵的聚类算法上,采用最佳优先搜索和轮廓系数,改进DBSCAN聚类方法,提出BF—DBSCAN方法。通过实例分析,比较了BF—DBSCAN和DBSCAN方法的聚类结果,以及不同参数设置对BF—DBSCAN聚类结果的影响,验证了该方法的有效性和实用性。  相似文献   

7.
文章提出了一种基于互信息量的改进K-Modes聚类方法,采用样本互信息来刻画数据对象属性之间的相互关系。在此基础上提出了一种新的距离度量,该距离度量方法既考虑了对象某个属性值本身的不同,又考虑了对象其它属性对该属性值的影响,使之更符合实际问题情况。实验结果表明,聚类方法有效地提高了聚类精度。  相似文献   

8.
周勇  林旬 《统计与决策》2007,(10):28-30
本文给出用欧氏距离与时间弯曲距离进行时间序列相似性判断的缺陷,并给出基于欧氏几何图形相似理论的判定两个时间序列相似性的方法。文中给出两条折线的相似性的判断方法。又由于时间序列与折线之间的可转化性,就把判断两折线的相似性方法运用到判定两个时间序列的相似性上。最后,把这种方法应用到聚类分析中,取得较好的效果。  相似文献   

9.
函数数据聚类及其在金融时序分析中的应用   总被引:1,自引:0,他引:1  
函数数据分析正成为近年来的研究热点。文章针对函数数据聚类分析方法的研究,首先在LP空间构建函数数据之间相异性度量指标,并利用基函数将函数数据平滑,提出了函数数据的聚类分析方法,指出了通过最小二乘估计得到的正交基函数系数进行聚类的结果接近于直接对原始数据进行聚类的结果。其方法应用于时间序列的模式挖掘,得到了良好的效果。  相似文献   

10.
基于形状特征的多指标面板数据聚类方法及其应用   总被引:1,自引:0,他引:1  
针对多指标面板数据的样品分类问题,从特征提取角度提出一个多指标面板数据的聚类分析方法。该方法将时间序列的局部变化特性与整体距离关系结合起来,将局部变化的信息融入相似测度的计算中,提出一种自适应滑动窗口分段方法,实现时间序列局部变化的特征提取,在重新定义综合距离的基础上,提出一种聚类方法。通过实证分析,表明新方法能够解决多指标面板数据聚类的问题,分类效果较好。  相似文献   

11.
面板数据聚类方法及应用   总被引:7,自引:0,他引:7  
 基于面板数据的时序特征和截面特征,综合考虑面板数据“绝对指标”,“增量指标”及其“时序波动”特征,在重构面板数据相似性测度的距离函数和Ward聚类算法的基础上,提出了面板数据聚类方法。并以2003-2007年财政金融面板数据为例,对中国14个沿海开放城市进行了聚类分析,显示了良好的应用性。  相似文献   

12.
Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.  相似文献   

13.
Spectral clustering uses eigenvectors of the Laplacian of the similarity matrix. It is convenient to solve binary clustering problems. When applied to multi-way clustering, either the binary spectral clustering is recursively applied or an embedding to spectral space is done and some other methods, such as K-means clustering, are used to cluster the points. Here we propose and study a K-way clustering algorithm – spectral modular transformation, based on the fact that the graph Laplacian has an equivalent representation, which has a diagonal modular structure. The method first transforms the original similarity matrix into a new one, which is nearly disconnected and reveals a cluster structure clearly, then we apply linearized cluster assignment algorithm to split the clusters. In this way, we can find some samples for each cluster recursively using the divide and conquer method. To get the overall clustering results, we apply the cluster assignment obtained in the previous step as the initialization of multiplicative update method for spectral clustering. Examples show that our method outperforms spectral clustering using other initializations.  相似文献   

14.
在面板数据聚类分析方法的研究中,基于面板数据兼具截面维度和时间维度的特征,对欧氏距离函数进行了改进,在聚类过程中考虑指标权重与时间权重,提出了适用于面板数据聚类分析的"加权距离函数"以及相应的Ward.D聚类方法。首先定义了考虑指标绝对值、邻近时点增长率以及波动变异程度的欧氏距离函数;然后,将指标权重与时间权重通过线性模型集结成综合加权距离,最终实现面板数据的加权聚类过程。实证分析结果显示,考虑指标权重与时间权重的面板数据加权聚类分析方法具有更好的分辨能力,能提高样本聚类的准确性。  相似文献   

15.
The problem of modelling multivariate time series of vehicle counts in traffic networks is considered. It is proposed to use a model called the linear multiregression dynamic model (LMDM). The LMDM is a multivariate Bayesian dynamic model which uses any conditional independence and causal structure across the time series to break down the complex multivariate model into simpler univariate dynamic linear models. The conditional independence and causal structure in the time series can be represented by a directed acyclic graph (DAG). The DAG not only gives a useful pictorial representation of the multivariate structure, but it is also used to build the LMDM. Therefore, eliciting a DAG which gives a realistic representation of the series is a crucial part of the modelling process. A DAG is elicited for the multivariate time series of hourly vehicle counts at the junction of three major roads in the UK. A flow diagram is introduced to give a pictorial representation of the possible vehicle routes through the network. It is shown how this flow diagram, together with a map of the network, can suggest a DAG for the time series suitable for use with an LMDM.  相似文献   

16.
Given that the Euclidean distance between the parameter estimates of autoregressive expansions of autoregressive moving average models can be used to classify stationary time series into groups, a test of hypothesis is proposed to determine whether two stationary series in a particular group have significantly different generating processes. Based on this test a new clustering algorithm is also proposed. The results of Monte Carlo simulations are given.  相似文献   

17.
This paper suggests an evolving possibilistic approach for fuzzy modelling of time-varying processes. The approach is based on an extension of the well-known possibilistic fuzzy c-means (FCM) clustering and functional fuzzy rule-based modelling. Evolving possibilistic fuzzy modelling (ePFM) employs memberships and typicalities to recursively cluster data, and uses participatory learning to adapt the model structure as a stream data is input. The idea of possibilistic clustering plays a key role when the data are noisy and with outliers due to the relaxation of the restriction on membership degrees to add up unity in FCM clustering algorithm. To show the usefulness of ePFM, the approach is addressed for system identification using Box & Jenkins gas furnace data as well as time series forecasting considering the chaotic Mackey–Glass series and data produced by a synthetic time-varying process with parameter drift. The results show that ePFM is a potential candidate for nonlinear time-varying systems modelling, with comparable or better performance than alternative approaches, mainly when noise and outliers affect the data available.  相似文献   

18.
ABSTRACT

The most common measure of dependence between two time series is the cross-correlation function. This measure gives a complete characterization of dependence for two linear and jointly Gaussian time series, but it often fails for nonlinear and non-Gaussian time series models, such as the ARCH-type models used in finance. The cross-correlation function is a global measure of dependence. In this article, we apply to bivariate time series the nonlinear local measure of dependence called local Gaussian correlation. It generally works well also for nonlinear models, and it can distinguish between positive and negative local dependence. We construct confidence intervals for the local Gaussian correlation and develop a test based on this measure of dependence. Asymptotic properties are derived for the parameter estimates, for the test functional and for a block bootstrap procedure. For both simulated and financial index data, we construct confidence intervals and we compare the proposed test with one based on the ordinary correlation and with one based on the Brownian distance correlation. Financial indexes are examined over a long time period and their local joint behavior, including tail behavior, is analyzed prior to, during and after the financial crisis. Supplementary material for this article is available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号