首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 218 毫秒
1.
高海燕等 《统计研究》2020,37(8):91-103
函数型聚类分析算法涉及投影和聚类两个基本要素。通常,最优投影结果未必能够有效地保留类别信息,从而影响后续聚类效果。为此,本文梳理了函数型聚类的构成要素及运行过程;借助非负矩阵分解的聚类特性,提出了基于非负矩阵分解的函数型聚类算法,构建了“投影与聚类”并行的实现框架,并采用交替迭代方法更新求解,分析了算法的计算时间复杂度。针对随机模拟数据验证和语音识别数据的实例检验结果显示,该函数型聚类算法有助于提高聚类效果;针对北京市二氧化氮(NO2)污染物小时浓度数据的实例应用表明,该函数型聚类算法对空气质量监测点类型的区分能够充分识别站点布局的空间模式,具有良好的实际应用价值。  相似文献   

2.
股票收益波动具有典型的连续函数特征,将其纳入连续动态函数范畴分析,能够挖掘现有离散分析方法不能揭示的深层次信息。本文基于连续动态函数视角研究上证50指数样本股票收益波动的类别模式和时段特征。首先由实际离散观测数据信息自行驱动,重构隐含在其中的本征收益波动函数。进一步,利用函数型主成分正交分解收益函数波动的主趋势,在无核心信息损失的主成分降维基础上,引入自适应权重聚类分析客观划分股票收益函数波动的模式类别。最后,利用函数型方差分析检验不同类别收益函数之间波动差异的显著性和稳健性,并基于波动函数周期性时段划分,图形展示和可视化剖析每一类别收益函数在不同时段波动的势能转化规律。研究发现:上证综指股票收益波动的主导趋势可以分解为四个子模式,50只股票存在五类显著的波动模式类别,并且5类波动模式的特征差异主要体现在本次研究区间的初始阶段。本文拓展了股票收益波动模式分类和差异因素分析的研究视角,能够为金融监管部门的管理策略制定和证券市场的投资组合配置提供实证支持。  相似文献   

3.
谢晶  李迪 《统计与决策》2024,(3):151-156
文章基于2019—2020年我国主要贸易国家(地区)的进出口数据,以进出口商品总额作为衡量对外贸易状况的代理指标,利用Kernel密度估计探讨我国对外贸易格局整体变化情况,在此基础上,考虑空间非均衡与空间极化性,分别采用Dagum基尼系数改变量和LU极化指数改变量从国家(地区)组别和商品类别两个维度测度了我国对外贸易格局稳定性水平。结果表明:(1)我国对外贸易格局存在显著的空间非均衡与空间极化现象;(2)新冠肺炎疫情暴发后,我国对外贸易格局的总体稳定性水平较高,且国家(地区)组别间稳定性的贡献率最大;(3)通过二维矩阵法可将进出口商品类别划分为紧密型、震荡型、松散型三大类,其中,归属于紧密型的第2类、第4类和第9类商品对改善贸易状况具有积极的作用。  相似文献   

4.
王芝皓等 《统计研究》2021,38(7):127-139
在实际数据分析中经常会遇到零膨胀计数数据作为响应变量与函数型随机变量和随机向量作为预测变量相关联。本文考虑函数型部分变系数零膨胀模型 (FPVCZIM),模型中无穷维的斜率函数用函数型主成分基逼近,系数函数用B-样条进行拟合。通过EM 算法得到估计量,讨论其理论性质,在一些正则条件下获得了斜率函数和系数函数估计量的收敛速度。有限样本的Monte Carlo 模拟研究和真实数据分析被用来解释本文提出的方法。  相似文献   

5.
文章在一个一般性的框架下研究了利用基函数展开进行函数型数据聚类的问题.在这个框架之下,大量传统的聚类方法都可以直接应用到函数型数据分析.另外,我们将Pearson相似系数引入函数型数据聚类分析,解决了欧式距离无法刻画曲线之间形态差异的问题.  相似文献   

6.
函数数据聚类及其在金融时序分析中的应用   总被引:1,自引:0,他引:1  
函数数据分析正成为近年来的研究热点。文章针对函数数据聚类分析方法的研究,首先在LP空间构建函数数据之间相异性度量指标,并利用基函数将函数数据平滑,提出了函数数据的聚类分析方法,指出了通过最小二乘估计得到的正交基函数系数进行聚类的结果接近于直接对原始数据进行聚类的结果。其方法应用于时间序列的模式挖掘,得到了良好的效果。  相似文献   

7.
提出基于函数序列的收入不平等动态测度思路与方法:采用B-样条拟合洛伦兹曲线序列;在生成函数型数据的基础上,对洛伦兹曲线序列进行函数型主成分分析。利用函数型数据建模,对1990-2010年中国城镇居民收入洛伦兹曲线序列变迁特征进行探索性数据分析,结果表明:采用人口五分法划分收入群体具有合理性;各收入群体对历年收入不平等程度变迁的贡献率分别为:低收入群体2.30%,中等收入群体80.33%,高收入群体17.36%。  相似文献   

8.
文章介绍了组合混沌映射的概念,并阐述了无穷维思想,在此基础上提出了基于组合混沌映射的无穷维伪随机数发生方法。仿真实验证实该方法产生的均匀分布伪随机数具有良好的统计特性和安全性能。  相似文献   

9.
函数数据聚类分析方法探析   总被引:3,自引:0,他引:3  
函数数据是目前数据分析中新出现的一种数据类型,它同时具有时间序列和横截面数据的特征,通常可以描述为关于某一变量的函数图像,在实际应用中具有很强的实用性。首先简要分析函数数据的一些基本特征和目前提出的一些函数数据聚类方法,如均匀修正的函数数据K均值聚类方法、函数数据层次聚类方法等,并在此基础上,从函数特征分析的角度探讨了函数数据聚类方法,提出了一种基于导数分析的函数数据区间聚类分析方法,并利用中国中部六省的就业人口数据对该方法进行实证分析,取得了聚类结果。  相似文献   

10.
文章考虑了Cox模型的变量选择问题,将自适应Lasso引入到Cox模型中,提出了一类基于惩罚偏似然函数的自适应Lasso估计程序.通过对偏似然函数采用二阶泰勒展开式近似逼近,运用循环坐标下降法求解模型,再借助牛顿-拉普森迭代完成整个变量选择和估计过程.随机数据模拟的结果表明该方法具有优良的变量选择效果,并适用于高维数据.  相似文献   

11.
The self-updating process (SUP) is a clustering algorithm that stands from the viewpoint of data points and simulates the process how data points move and perform self-clustering. It is an iterative process on the sample space and allows for both time-varying and time-invariant operators. By simulations and comparisons, this paper shows that SUP is particularly competitive in clustering (i) data with noise, (ii) data with a large number of clusters, and (iii) unbalanced data. When noise is present in the data, SUP is able to isolate the noise data points while performing clustering simultaneously. The property of the local updating enables SUP to handle data with a large number of clusters and data of various structures. In this paper, we showed that the blurring mean-shift is a static SUP. Therefore, our discussions on the strengths of SUP also apply to the blurring mean-shift.  相似文献   

12.
Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Comparing the clustering methods for the real data set confirmed the findings of the simulation. This study yields concrete suggestions to future researchers to determine the best method for clustering their functional data.  相似文献   

13.
The forward search is a method of robust data analysis in which outlier free subsets of the data of increasing size are used in model fitting; the data are then ordered by closeness to the model. Here the forward search, with many random starts, is used to cluster multivariate data. These random starts lead to the diagnostic identification of tentative clusters. Application of the forward search to the proposed individual clusters leads to the establishment of cluster membership through the identification of non-cluster members as outlying. The method requires no prior information on the number of clusters and does not seek to classify all observations. These properties are illustrated by the analysis of 200 six-dimensional observations on Swiss banknotes. The importance of linked plots and brushing in elucidating data structures is illustrated. We also provide an automatic method for determining cluster centres and compare the behaviour of our method with model-based clustering. In a simulated example with eight clusters our method provides more stable and accurate solutions than model-based clustering. We consider the computational requirements of both procedures.  相似文献   

14.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

15.
Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

16.
In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.  相似文献   

17.
Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.  相似文献   

18.
Clustering gene expression data are an important step in providing information to biologists. A Bayesian clustering procedure using Fourier series with a Dirichlet process prior for clusters was developed. As an optimal computational tool for this Bayesian approach, Gibbs sampling of a normal mixture with a Dirichlet process was implemented to calculate the posterior probabilities when the number of clusters was unknown. Monte Carlo study results showed that the model was useful for suitable clustering. The proposed method was applied to the budding yeast Saccaromyces cerevisiae and provided biologically interpretable results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号