期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cluster Data Streams with Noisy Variables

Hu Yang Chenqun Yu 《统计学通讯:模拟与计算》2016,45(4):1381-1396

Clustering algorithms are important methods widely used in mining data streams because of their abilities to deal with infinite data flows. Although these algorithms perform well to mining latent relationship in data streams, most of them suffer from loss of cluster purity and become unstable when the inputting data streams have too many noisy variables. In this article, we propose a clustering algorithm to cluster data streams with noisy variables. The result from simulation shows that our proposal method is better than previous studies by adding a process of variable selection as a component in clustering algorithms. The results of two experiments indicate that clustering data streams with the process of variable selection are more stable and have better purity than those without such process. Another experiment testing KDD-CUP99 dataset also shows that our algorithm can generate more stable result. 相似文献

2.

Multiple Spatio-Temporal Cluster Detection for Case Event Data: An Ordering-Based Approach

C. Demattei L. Cucala 《统计学通讯:理论与方法》2013,42(2):358-372

This article introduces a spatio-temporal distance which allows the extension of the spatial cluster detection methods of Demattei et al. (2007 Demattei , C. , Molinari , N. , Daures , J. P. ( 2007 ). Arbitrarily shaped multiple spatial cluster detection for case event data . Computat. Statist. Data Anal. 51 ( 8 ): 3931 – 3945 . [Google Scholar]) and Cucala (2009 Cucala , L. ( 2009 ). A flexible spatial scan test for case event data . Computat. Statist. Data Anal. 53 ( 8 ): 2843 – 2850 .[Crossref], [Web of Science ®] , [Google Scholar]). A review of these methods is given before we define a spatio-temporal distance. Then this distance is used for detecting spatio-temporal clusters. These ordering-based methods are compared to the scan statistic by a simulation study. The scan procedure is more powerful but it detects fewer true positives due to its lack of flexibility. Those techniques are applied to a seismic data set. This article highlights two advantages of the ordering-based methods: their flexibility and their low computational demand. 相似文献

3.

Using a Truncated C p Statistic for Variable Selection in Multiple Linear Regression

D. W. Uys S. J. Steel 《统计学通讯:模拟与计算》2013,42(2):420-432

In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ?ⁿ corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C _p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C _p statistic that can also be used to estimate this mean square error. The C _p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C _p. 相似文献