首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到3条相似文献,搜索用时 0 毫秒
1.
Clustering algorithms are important methods widely used in mining data streams because of their abilities to deal with infinite data flows. Although these algorithms perform well to mining latent relationship in data streams, most of them suffer from loss of cluster purity and become unstable when the inputting data streams have too many noisy variables. In this article, we propose a clustering algorithm to cluster data streams with noisy variables. The result from simulation shows that our proposal method is better than previous studies by adding a process of variable selection as a component in clustering algorithms. The results of two experiments indicate that clustering data streams with the process of variable selection are more stable and have better purity than those without such process. Another experiment testing KDD-CUP99 dataset also shows that our algorithm can generate more stable result.  相似文献   

2.
This article introduces a spatio-temporal distance which allows the extension of the spatial cluster detection methods of Demattei et al. (2007 Demattei , C. , Molinari , N. , Daures , J. P. ( 2007 ). Arbitrarily shaped multiple spatial cluster detection for case event data . Computat. Statist. Data Anal. 51 ( 8 ): 39313945 . [Google Scholar]) and Cucala (2009 Cucala , L. ( 2009 ). A flexible spatial scan test for case event data . Computat. Statist. Data Anal. 53 ( 8 ): 28432850 .[Crossref], [Web of Science ®] [Google Scholar]). A review of these methods is given before we define a spatio-temporal distance. Then this distance is used for detecting spatio-temporal clusters. These ordering-based methods are compared to the scan statistic by a simulation study. The scan procedure is more powerful but it detects fewer true positives due to its lack of flexibility. Those techniques are applied to a seismic data set. This article highlights two advantages of the ordering-based methods: their flexibility and their low computational demand.  相似文献   

3.
In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号