首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 490 毫秒
1.
Cohen’s kappa, a special case of the weighted kappa, is a chance‐corrected index used extensively to quantify inter‐rater agreement in validation and reliability studies. In this paper, it is shown that in inter‐rater agreement for 2 × 2 tables, for two raters having the same number of opposite ratings, the weighted kappa, Cohen’s kappa, Peirce, Yule, Maxwell and Pilliner and Fleiss indices are identical. This implies that the weights in the weighted kappa are less important under such assumptions. Equivalently, it is shown that for two partitions of the same data set, resulting from two clustering algorithms having the same number of clusters with equal cluster sizes, these similarity indices are identical. Hence, an important characterisation is formulated relating equal numbers of clusters with the same cluster sizes to the presence/absence of a trait in a reliability study. Two numerical examples that exemplify the implication of this relationship are presented.  相似文献   

2.
Clustering algorithms are used in the analysis of gene expression data to identify groups of genes with similar expression patterns. These algorithms group genes with respect to a predefined dissimilarity measure without using any prior classification of the data. Most of the clustering algorithms require the number of clusters as input, and all the objects in the dataset are usually assigned to one of the clusters. We propose a clustering algorithm that finds clusters sequentially, and allows for sporadic objects, so there are objects that are not assigned to any cluster. The proposed sequential clustering algorithm has two steps. First it finds candidates for centers of clusters. Multiple candidates are used to make the search for clusters more efficient. Secondly, it conducts a local search around the candidate centers to find the set of objects that defines a cluster. The candidate clusters are compared using a predefined score, the best cluster is removed from data, and the procedure is repeated. We investigate the performance of this algorithm using simulated data and we apply this method to analyze gene expression profiles in a study on the plasticity of the dendritic cells.  相似文献   

3.
Spectral clustering uses eigenvectors of the Laplacian of the similarity matrix. It is convenient to solve binary clustering problems. When applied to multi-way clustering, either the binary spectral clustering is recursively applied or an embedding to spectral space is done and some other methods, such as K-means clustering, are used to cluster the points. Here we propose and study a K-way clustering algorithm – spectral modular transformation, based on the fact that the graph Laplacian has an equivalent representation, which has a diagonal modular structure. The method first transforms the original similarity matrix into a new one, which is nearly disconnected and reveals a cluster structure clearly, then we apply linearized cluster assignment algorithm to split the clusters. In this way, we can find some samples for each cluster recursively using the divide and conquer method. To get the overall clustering results, we apply the cluster assignment obtained in the previous step as the initialization of multiplicative update method for spectral clustering. Examples show that our method outperforms spectral clustering using other initializations.  相似文献   

4.
Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Comparing the clustering methods for the real data set confirmed the findings of the simulation. This study yields concrete suggestions to future researchers to determine the best method for clustering their functional data.  相似文献   

5.
In this study, an attempt has been made to classify the textile fabrics based on the physical properties using statistical multivariate techniques like discriminant analysis and cluster analysis. Initially, the discriminant functions have been constructed for the classification of the three known categories of fabrics made up of polyster, lyocell/viscose and treated-polyster. The classification yielded hundred per cent accuracy. Each of the three different categories of fabrics has been further subjected to the K-means clustering algorithm that yielded three clusters. These clusters are subjected to discriminant analysis which again yielded a 100% correct classification, indicating that the clusters are well separated. The properties of clusters are also investigated with respect to the measurements.  相似文献   

6.
In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.  相似文献   

7.
8.
This paper addresses the issue of designing finite-sample corrections to information matrix tests. We review a Cornish-Fisher correction that has been propowed elsewhere and propose an alternative, Bartlett-type correction. Simulation results for skewness, excess kurtosis, normality and heteroskedasticity tests are given.  相似文献   

9.
The multivariate split normal distribution extends the usual multivariate normal distribution by a set of parameters which allows for skewness in the form of contraction/dilation along a subset of the principal axes. This article derives some properties for this distribution, including its moment generating function, multivariate skewness, and kurtosis, and discusses its role as a population model for asymmetric principal components analysis. Maximum likelihood estimators and a complete Bayesian analysis, including inference on the number of skewed dimensions and their directions, are presented.  相似文献   

10.
We derive matrix formulae in closed form for the unconditional third and fourth moments of a broad class of vector autoregressive time series with regime switching. First and second moments are well known. New measures of multivariate skewness and kurtosis are introduced and basic properties are investigated. The knowledge of series level, variation, co-movements, skewness, and kurtosis is useful to support model interpretation in real data application. Numerical examples complete the paper.  相似文献   

11.
Abstract

Cluster analysis is the distribution of objects into different groups or more precisely the partitioning of a data set into subsets (clusters) so that the data in subsets share some common trait according to some distance measure. Unlike classification, in clustering one has to first decide the optimum number of clusters and then assign the objects into different clusters. Solution of such problems for a large number of high dimensional data points is quite complicated and most of the existing algorithms will not perform properly. In the present work a new clustering technique applicable to large data set has been used to cluster the spectra of 702248 galaxies and quasars having 1,540 points in wavelength range imposed by the instrument. The proposed technique has successfully discovered five clusters from this 702,248X1,540 data matrix.  相似文献   

12.
Summary.  A new procedure is proposed for clustering attribute value data. When used in conjunction with conventional distance-based clustering algorithms this procedure encourages those algorithms to detect automatically subgroups of objects that preferentially cluster on subsets of the attribute variables rather than on all of them simultaneously. The relevant attribute subsets for each individual cluster can be different and partially (or completely) overlap with those of other clusters. Enhancements for increasing sensitivity for detecting especially low cardinality groups clustering on a small subset of variables are discussed. Applications in different domains, including gene expression arrays, are presented.  相似文献   

13.
It is well documented in the literature that the sample skewness and excess kurtosis can be severely biased in finite samples. In this paper, we derive analytical results for their finite-sample biases up to the second order. In general, the bias results depend on the cumulants (up to the sixth order) as well as the dependency structure of the data. Using an AR(1) process for illustration, we show that a feasible bias-correction procedure based on our analytical results works remarkably well for reducing the bias of the sample skewness. Bias-correction works reasonably well also for the sample kurtosis under some moderate degree of dependency. In terms of hypothesis testing, bias-correction offers power improvement when testing for normality, and bias-correction under the null provides also size improvement. However, for testing nonzero skewness and/or excess kurtosis, there exist nonnegligible size distortions in finite samples and bias-correction may not help.  相似文献   

14.
The use of single group skewness and kurtosis critical values for the assessment of residual normality in the ANOVA model is examined. Using single group critical values gives a conservative test of residual normality in multiple group designs. As the sample size per group increases, the empirical Type I error rates for the skewness and kurtosis tests of residual normality approach a. These results supplement previous work which has focused on testing residual normality in the linear regression model.  相似文献   

15.
The K-means clustering method is a widely adopted clustering algorithm in data mining and pattern recognition, where the partitions are made by minimizing the total within group sum of squares based on a given set of variables. Weighted K-means clustering is an extension of the K-means method by assigning nonnegative weights to the set of variables. In this paper, we aim to obtain more meaningful and interpretable clusters by deriving the optimal variable weights for weighted K-means clustering. Specifically, we improve the weighted k-means clustering method by introducing a new algorithm to obtain the globally optimal variable weights based on the Karush-Kuhn-Tucker conditions. We present the mathematical formulation for the clustering problem, derive the structural properties of the optimal weights, and implement an recursive algorithm to calculate the optimal weights. Numerical examples on simulated and real data indicate that our method is superior in both clustering accuracy and computational efficiency.  相似文献   

16.
The odd Weibull distribution is a three-parameter generalization of the Weibull and the inverse Weibull distributions having rich density and hazard shapes for modeling lifetime data. This paper explored the odd Weibull parameter regions having finite moments and examined the relation to some well-known distributions based on skewness and kurtosis functions. The existence of maximum likelihood estimators have shown with complete data for any sample size. The proof for the uniqueness of these estimators is given only when the absolute value of the second shape parameter is between zero and one. Furthermore, elements of the Fisher information matrix are obtained based on complete data using a single integral representation which have shown to exist for any parameter values. The performance of the odd Weibull distribution over various density and hazard shapes is compared with generalized gamma distribution using two different test statistics. Finally, analysis of two data sets has been performed for illustrative purposes.  相似文献   

17.
In this paper the out-of-sample prediction of Value-at-Risk by means of models accounting for higher moments is studied. We consider models differing in terms of skewness and kurtosis and, in particular, the GARCHDSK model, which allows for constant and dynamic skewness and kurtosis. The issue of VaR prediction performance is approached first from a purely statistical viewpoint, studying the properties concerning correct coverage rates and independence of VaR violations. Then, financial implications of different VaR models, in terms of market risk capital requirements, as defined by the Basel Accord, are considered. Our results, based on the analysis of eight international stock indexes, highlight the presence of conditional skewness and kurtosis, in some case time-varying, and point out that asymmetry plays a significant role in risk management.  相似文献   

18.
The class of skew-symmetric distributions has received much attention in recent years. In this article, we introduce two distributions which can capture the skew-symmetric unimodal (e.g., skew-Laplace, skew-normal) and the skew-symmetric bimodal ones systematically. Their natural generalizations of the skew-Laplace and the skew-normal distributions provide greater flexibility in modeling real data distributions. These models also avoid the identifiability problems of using mixtures to fit bimodal data. The stochastic representations that provide the random number generation algorithms are presented. The explicit forms of the central moments indicated that the proposed distributions have wide ranges of the skewness and kurtosis measures.  相似文献   

19.
This paper investigates an asymptotic distribution of the Akaike information criterion (AIC) and presents its characteristics in normal linear regression models. The bias correction of the AIC has been studied. It may be noted that the bias is only the mean, i.e., the first moment. Higher moments are important for investigating the behavior of the AIC. The variance increases as the number of explanatory variables increases. The skewness and kurtosis imply a favorable accuracy of the normal approximation. An asymptotic expansion of the distribution function of a standardized AIC is also derived.  相似文献   

20.
ABSTRACT

We introduce a new parsimonious bimodal distribution, referred to as the bimodal skew-symmetric Normal (BSSN) distribution, which is potentially effective in capturing bimodality, excess kurtosis, and skewness. Explicit expressions for the moment-generating function, mean, variance, skewness, and excess kurtosis were derived. The shape properties of the proposed distribution were investigated in regard to skewness, kurtosis, and bimodality. Maximum likelihood estimation was considered and an expression for the observed information matrix was provided. Illustrative examples using medical and financial data as well as simulated data from a mixture of normal distributions were worked.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号