期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《统计与信息论坛》2017,(8):96-103

以内蒙古自治区12个盟市的绿色资源环境发展为研究对象,采用灰色动态聚类与粗糙集相结合的方法,构建包含有全年供水量等11个指标的内蒙古自治区绿色资源环境指标体系,其要点在于:一是通过灰色关联分析建立样本间的灰色关联矩阵,进而进行样本间的灰色聚类,反映样本间的信息重复性;二是采用动态聚类方法,每次去除一个指标后继续通过灰色关联分析建立的灰色关联矩阵进行灰色样本聚类,为粗糙集约简提供信息数据;三是通过粗糙集约简理论判断海选指标对聚类结果的影响是否显著,将每一次的聚类结果与原始聚类结果比较,保留两次聚类结果不同且对评价样本分类有显著影响的海选指标;四是采用非参数Kruska-Wallis检验的P值检验法证明本文构建的指标体系是合理的。通过对比分析表明,本文的灰色动态聚类-粗糙集指标筛选模型优于现有研究的聚类-灰色关联指标筛选模型。相似文献

2.

综合评价中的非线性优化层次分析法 总被引：3，自引：0，他引：3

王成饶从军《统计与决策》2007,(3):134-135

在多指标综合评价问题中,涉及到的一个关键问题是各评价指标的权重的确定。目前,用来确定权重系数的方法有:层次分析法(AHP)、德尔菲法、模糊聚类法、灰色关联法、熵值法、人工神络定权法、因子分析法和路径分析法等。其中层次分析法是目前较常用的方法之一,它是系统工程中十分典型的定性分析和定理分析综合集成的方法。相似文献

3.

灰色综合聚类评估模型的研究 总被引：10，自引：1，他引：9

党耀国刘思峰翟振杰唐学文《统计与决策》2004,(10):4-5

一、问题的提出灰色系统理论自1982年邓聚龙教授创立以来得到了迅速发展,灰色聚类评估分析一直是灰色系统理论讨论较多的灰色技术之一.邓聚龙教授创立的变权聚类方法,刘思峰教授提出了定权灰色聚类评估分析,肖新平提出了灰色最优聚类,许秀莉讨论了灰色聚类分析的改进措施,刘思峰还提出了基于三角白化权函数的灰色聚类评估,以上讨论从不同的方面对灰色聚类评估分析进行了研究,以上各种灰色聚类评估分析的方法,最后的集结方法都是对灰色聚类系数向量的分量的大小进行比较来判定聚类对象属于某一灰类,而在实际中,往往会遇到灰色聚类系数无显著性差异,当聚类系数无显著性差异时,以上学者的研究方法就无法判定聚类对象应属于何灰类. 相似文献

4.

灰色聚类方法在高校图书馆综合评估中的应用 总被引：2，自引：0，他引：2

王雯《统计与决策》2008,(13)

文章根据高校图书馆综合评估的要求和统计数据特点,应用灰色聚类方法进行多指标数据的权重计算和白化处理,结合SPSS和Metlab软件应用,实现了多评价指标的科学赋权和图书馆的分类排名,是对高校图书馆综合评估计算方法的有益探讨。相似文献

5.

带粘性的层次Dirichlet过程聚类方法

《统计与信息论坛》2019,(8):20-26

对由多个指标组成的多元数据进行聚类分析时,数据维度的增加、各指标与总体聚类的相关性程度不一致以及各指标服从的分布不同会增加聚类的复杂性,影响聚类结果的准确性,因此需要通过合适的方法来对多元数据进行聚类分析。针对这一问题,提出改进的带粘性的层次Dirichlet过程(sticky Hierarchical Dirichlet Process)方法来实现对多元数据的降维聚类,以解决各指标服从不同分布的问题,并用粘性参数反映各指标与总体聚类之间的相关性。用MCMC方法来估计模型参数。通过对仿真模拟数据和IRIS数据集的聚类分析,证实了该方法的有效性,同时发现单个指标与总体聚类的相关性越大,则相应的粘性参数越大,从而反映该指标在总体聚类中的重要性程度越高;并且当各指标数据中有粘性较大的指标时,带粘性的层次Dirichlet过程方法明显优于其他聚类方法,能够显著提高分类的准确性。相似文献

6.

基于相对隶属度的灰色聚类评估方法

王正新党耀国裴玲玲《统计与决策》2012,(3):100-102

文章将相对隶属度引入到灰色聚类评估中,对灰色最优聚类评估问题进行了研究。通过样本观测值与标准特征值的关联系数来反映评估对象与各类中心的相似程度,在此基础上,基于相对隶属度的概念,建立了一种优化的灰色聚类评估模型,构造拉格朗日函数解得相对隶属度,最后根据相对隶属度的大小对评估对象进行归类,从而将分类模糊隶属度信息有效融入灰色聚类评估过程中。最后通过实例验证了该模型的有效性与实用性。相似文献

7.

基于稀疏聚类的高维数据特征选择及应用

张陶陶胡亚南李扬田茂再《统计与决策》2017,(4):18-24

文章研究了一种高维数据聚类特征选择方法——稀疏聚类,稀疏聚类是通过对特征变量赋予权重,并添加lasso惩罚因子,压缩权重,得到对变量的权重排序,即重要性排序,使其在进行分类预测的同时达到自动剔除冗余变量的效果,从而起到了对高维数据聚类时的特征选择作用.将此方法运用于中国环保问题,将中国31个省份根据环保情况分为3类,并从现有的104个环保指标中筛选得到20个重要指标. 相似文献

8.

基于熵值法与灰色关联决策的最佳响应方案

周正龙马本江胡凤英《统计与决策》2017,(8):46-49

文章在分析灰色决策研究成果的基础上,引入熵值法用以确定属性值为区间数的灰色关联决策方法的权重,同时也修正了决策方法对权重的不确定性.提出上界指标权重、下界指标权重和综合指标权重三种方法用于决策权重的计算.通过应急响应算例对修正的灰色关联决策方法进行检验分析,结果表明,三种权重计算方法均能有效解决灰色关联决策问题中的权重问题,可以减少整个决策的随机性. 相似文献

9.

基于灰色聚类—网络分析法的工程项目风险评价

项勇张仕廉《统计与决策》2012,(1):91-93

文章针对现代工程项目投资大、建设周期长、技术复杂等特点,构建了大型工程项目的风险评价指标体系,在此基础上运用网络分析法分析了指标体系之间的相互影响和作用得到指标权重,进而运用灰色聚类分析法将分散的风险评价信息处理成不同灰类度的评价量并得到风险综合评价值,最后通过实例说明该评价方法的有效性和科学性。相似文献

10.

灰色聚类评估的一种改进方法 总被引：2，自引：1，他引：1

李宜敏罗爱民吕凤虎《统计与决策》2007,(1):20-21

灰色聚类评估方法是灰色理论技术家族中最早发展并得以广泛应用的一门技术,它与灰色系统分析、灰色建模、灰色预测、灰色决策、灰色控制、灰色优化一起构成了较为成熟的灰色理论技术体系。常见的灰色聚类方法有灰色关联聚类、邓聚龙教授的灰色变权聚类方法、刘思峰教授的定权灰相似文献

11.

A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling

E. F. Saraiva C. A. B. Pereira A. K. Suzuki 《Journal of Statistical Computation and Simulation》2019,89(15):2848-2870

In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets. 相似文献

12.

A sequential clustering algorithm with applications to gene expression data

Jongwoo Song Dan L. Nicolae 《Journal of the Korean Statistical Society》2009,38(2):175-184

Clustering algorithms are used in the analysis of gene expression data to identify groups of genes with similar expression patterns. These algorithms group genes with respect to a predefined dissimilarity measure without using any prior classification of the data. Most of the clustering algorithms require the number of clusters as input, and all the objects in the dataset are usually assigned to one of the clusters. We propose a clustering algorithm that finds clusters sequentially, and allows for sporadic objects, so there are objects that are not assigned to any cluster. The proposed sequential clustering algorithm has two steps. First it finds candidates for centers of clusters. Multiple candidates are used to make the search for clusters more efficient. Secondly, it conducts a local search around the candidate centers to find the set of objects that defines a cluster. The candidate clusters are compared using a predefined score, the best cluster is removed from data, and the procedure is repeated. We investigate the performance of this algorithm using simulated data and we apply this method to analyze gene expression profiles in a study on the plasticity of the dendritic cells. 相似文献

13.

A comparative study of the K-means algorithm and the normal mixture model for clustering: Bivariate homoscedastic case

Dingxi Qiu 《Journal of statistical planning and inference》2010

The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance. 相似文献

14.

面板数据加权聚类分析方法研究

张立军彭浩《统计与信息论坛》2017,(4):21-26

在面板数据聚类分析方法的研究中,基于面板数据兼具截面维度和时间维度的特征,对欧氏距离函数进行了改进,在聚类过程中考虑指标权重与时间权重,提出了适用于面板数据聚类分析的"加权距离函数"以及相应的Ward.D聚类方法。首先定义了考虑指标绝对值、邻近时点增长率以及波动变异程度的欧氏距离函数;然后,将指标权重与时间权重通过线性模型集结成综合加权距离,最终实现面板数据的加权聚类过程。实证分析结果显示,考虑指标权重与时间权重的面板数据加权聚类分析方法具有更好的分辨能力,能提高样本聚类的准确性。相似文献

15.

Classification of textile fabrics using statistical multivariate techniques

C. Kiruthika R. Chandrasekaran 《Journal of applied statistics》2012,39(5):1129-1138

In this study, an attempt has been made to classify the textile fabrics based on the physical properties using statistical multivariate techniques like discriminant analysis and cluster analysis. Initially, the discriminant functions have been constructed for the classification of the three known categories of fabrics made up of polyster, lyocell/viscose and treated-polyster. The classification yielded hundred per cent accuracy. Each of the three different categories of fabrics has been further subjected to the K-means clustering algorithm that yielded three clusters. These clusters are subjected to discriminant analysis which again yielded a 100% correct classification, indicating that the clusters are well separated. The properties of clusters are also investigated with respect to the measurements. 相似文献

16.

Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation

Casey Olives Marcello Pagano Megan Deitchler Bethany L. Hedt Kari Egge Joseph J. Valadez 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(2):495-510

Summary. Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. 相似文献

17.

Clustering work and family trajectories by using a divisive algorithm

Raffaella Piccarreta Francesco C. Billari 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(4):1061-1078

Summary. We present an approach to the construction of clusters of life course trajectories and use it to obtain ideal types of trajectories that can be interpreted and analysed meaningfully. We represent life courses as sequences on a monthly timescale and apply optimal matching analysis to compute dissimilarities between individuals. We introduce a new divisive clustering algorithm which has features that are in common with both Ward's agglomerative algorithm and classification and regression trees. We analyse British Household Panel Survey data on the employment and family trajectories of women. Our method produces clusters of sequences for which it is straightforward to determine who belongs to each cluster, making it easier to interpret the relative importance of life course factors in distinguishing subgroups of the population. Moreover our method gives guidance on selecting the number of clusters. 相似文献

18.

Multilevel modelling of complex survey data

Sophia Rabe-Hesketh Anders Skrondal 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):805-827

Summary. Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used. 相似文献

19.

The cluster correlation-network support vector machine for high-dimensional binary classification

Rachid Kharoubi Abdallah Mkhadri 《Journal of Statistical Computation and Simulation》2019,89(6):1020-1043

ABSTRACT

Identifying homogeneous subsets of predictors in classification can be challenging in the presence of high-dimensional data with highly correlated variables. We propose a new method called cluster correlation-network support vector machine (CCNSVM) that simultaneously estimates clusters of predictors that are relevant for classification and coefficients of penalized SVM. The new CCN penalty is a function of the well-known Topological Overlap Matrix whose entries measure the strength of connectivity between predictors. CCNSVM implements an efficient algorithm that alternates between searching for predictors’ clusters and optimizing a penalized SVM loss function using Majorization–Minimization tricks and a coordinate descent algorithm. This combining of clustering and sparsity into a single procedure provides additional insights into the power of exploring dimension reduction structure in high-dimensional binary classification. Simulation studies are considered to compare the performance of our procedure to its competitors. A practical application of CCNSVM on DNA methylation data illustrates its good behaviour. 相似文献

20.

ON THE EQUIVALENCE OF SOME INDICES OF SIMILARITY: IMPLICATION FOR BINARY PRESENCE/ABSENCE DATA

Magdalena Niewiadomska‐Bugaj 《Australian & New Zealand Journal of Statistics》2012,54(2):189-198

Cohen’s kappa, a special case of the weighted kappa, is a chance‐corrected index used extensively to quantify inter‐rater agreement in validation and reliability studies. In this paper, it is shown that in inter‐rater agreement for 2 × 2 tables, for two raters having the same number of opposite ratings, the weighted kappa, Cohen’s kappa, Peirce, Yule, Maxwell and Pilliner and Fleiss indices are identical. This implies that the weights in the weighted kappa are less important under such assumptions. Equivalently, it is shown that for two partitions of the same data set, resulting from two clustering algorithms having the same number of clusters with equal cluster sizes, these similarity indices are identical. Hence, an important characterisation is formulated relating equal numbers of clusters with the same cluster sizes to the presence/absence of a trait in a reliability study. Two numerical examples that exemplify the implication of this relationship are presented. 相似文献