首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
Several computational studies suggest that the fuzzy c-means (FCM) clustering scheme may be used successfully in some cases to obtain estimates for the parameters of a statistical mixture (e.g., for a mixture of normal distributions). While these (limited) simulation results for the fuzzy c-means approach support this hypothesis, we provide herein an example that shows that the FCM cluster prototypes cannot generally be statistically consistent estimators of the centers (means) of any univariate mixture having symmetric component distributions  相似文献   

2.
This paper suggests an evolving possibilistic approach for fuzzy modelling of time-varying processes. The approach is based on an extension of the well-known possibilistic fuzzy c-means (FCM) clustering and functional fuzzy rule-based modelling. Evolving possibilistic fuzzy modelling (ePFM) employs memberships and typicalities to recursively cluster data, and uses participatory learning to adapt the model structure as a stream data is input. The idea of possibilistic clustering plays a key role when the data are noisy and with outliers due to the relaxation of the restriction on membership degrees to add up unity in FCM clustering algorithm. To show the usefulness of ePFM, the approach is addressed for system identification using Box & Jenkins gas furnace data as well as time series forecasting considering the chaotic Mackey–Glass series and data produced by a synthetic time-varying process with parameter drift. The results show that ePFM is a potential candidate for nonlinear time-varying systems modelling, with comparable or better performance than alternative approaches, mainly when noise and outliers affect the data available.  相似文献   

3.
Recently, Lee and Cha proposed two general classes of discrete bivariate distributions. They have discussed some general properties and some specific cases of their proposed distributions. In this paper we have considered one model, namely bivariate discrete Weibull distribution, which has not been considered in the literature yet. The proposed bivariate discrete Weibull distribution is a discrete analogue of the Marshall–Olkin bivariate Weibull distribution. We study various properties of the proposed distribution and discuss its interesting physical interpretations. The proposed model has four parameters, and because of that it is a very flexible distribution. The maximum likelihood estimators of the parameters cannot be obtained in closed forms, and we have proposed a very efficient nested EM algorithm which works quite well for discrete data. We have also proposed augmented Gibbs sampling procedure to compute Bayes estimates of the unknown parameters based on a very flexible set of priors. Two data sets have been analyzed to show how the proposed model and the method work in practice. We will see that the performances are quite satisfactory. Finally, we conclude the paper.  相似文献   

4.
The robust estimation and the local influence analysis for linear regression models with scale mixtures of multivariate skew-normal distributions have been developed in this article. The main virtue of considering the linear regression model under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation which allows an easy implementation of inference. Inspired by the expectation maximization algorithm, we have developed a local influence analysis based on the conditional expectation of the complete-data log-likelihood function, which is a measurement invariant under reparametrizations. This is because the observed data log-likelihood function associated with the proposed model is somewhat complex and with Cook's well-known approach it can be very difficult to obtain measures of the local influence. Some useful perturbation schemes are discussed. In order to examine the robust aspect of this flexible class against outlying and influential observations, some simulation studies have also been presented. Finally, a real data set has been analyzed, illustrating the usefulness of the proposed methodology.  相似文献   

5.
An important problem in network analysis is to identify significant communities. Most of the real-world data sets exhibit a certain topological structure between nodes and the attributes describing them. In this paper, we propose a new community detection criterion considering both structural similarities and attribute similarities. The clustering method integrates the cost of clustering node attributes with the cost of clustering the structural information via the normalized modularity. We show that the joint clustering problem can be formulated as a spectral relaxation problem. The proposed algorithm is capable of learning the degree of contributions of individual node attributes. A number of numerical studies involving simulated and real data sets demonstrate the effectiveness of the proposed method.  相似文献   

6.
Over a few decades, regression model has received considerable attention and has been shown to be successful when applied together with other models. One of the most successful models is the sample selection model or the selectivity model. However, uncertainties and ambiguities do exist in the models, particularly the relationship between the endogenous and exogenous variables. Therefore, it will disrupt the ability and effectiveness of the model proceeded to give the estimated value that can explain the actual situation of a phenomenon. These are the questions and problems that are yet to be explored and the main aim of this study. A new framework for estimation of the sample selection model using the concept of fuzzy modelling is introduced. In this approach, a flexible fuzzy concept hybrid with the parametric sample selection model is known as fuzzy parametric sample selection model (FPSSM). The elements of vagueness and uncertainty in the models are represented in the model construction, as a way of increasing the available information to produce a more accurate model. This led to the development of the convergence theorem presented in the form of triangular fuzzy numbers to be used in the model. Consistency is an indicator of effectiveness of the developed models and justified using Monte Carlo simulation. Consistency and efficiency of the proposed model are considered in this study. In order to achieve that condition, a Monte Carlo simulation is used. Hence, the error terms of FPSSM are assumed to follow the normal and the chi-square distributions. Simulation results show that FPSSM is consistent and efficient when its distributions are normal. Instead, the FPSSM by chi-square distribution is found to be inconsistent.  相似文献   

7.
Abstract

An aspect of cluster analysis which has been widely studied in recent years is the weighting and selection of variables. Procedures have been proposed which are able to identify the cluster structure present in a data matrix when that structure is confined to a subset of variables. Other methods assess the relative importance of each variable as revealed by a suitably chosen weight. But when a cluster structure is present in more than one subset of variables and is different from one subset to another, those solutions as well as standard clustering algorithms can lead to misleading results. Some very recent methodologies for finding consensus classifications of the same set of units can be useful also for the identification of cluster structures in a data matrix, but each one seems to be only partly satisfactory for the purpose at hand. Therefore a new more specific procedure is proposed and illustrated by analyzing two real data sets; its performances are evaluated by means of a simulation experiment.  相似文献   

8.
In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.  相似文献   

9.
Model-based clustering for social networks   总被引:5,自引:0,他引:5  
Summary.  Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model , under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean 'social space', and the actors' locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.  相似文献   

10.
We propose two probability-like measures of individual cluster-membership certainty that can be applied to a hard partition of the sample such as that obtained from the partitioning around medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual’s tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition using these measures. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft-clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher’s classic dataset on irises.  相似文献   

11.
The generalized exponential is the most commonly used distribution for analyzing lifetime data. This distribution has several desirable properties and it can be used quite effectively to analyse several skewed life time data. The main aim of this paper is to introduce absolutely continuous bivariate generalized exponential distribution using the method of Block and Basu (1974). In fact, the Block and Basu exponential distribution will be extended to the generalized exponential distribution. We call the new proposed model as the Block and Basu bivariate generalized exponential distribution, then, discuss its different properties. In this case the joint probability distribution function and the joint cumulative distribution function can be expressed in compact forms. The model has four unknown parameters and the maximum likelihood estimators cannot be obtained in explicit form. To compute the maximum likelihood estimators directly, one needs to solve a four dimensional optimization problem. The EM algorithm has been proposed to compute the maximum likelihood estimations of the unknown parameters. One data analysis is provided for illustrative purposes. Finally, we propose some generalizations of the proposed model and compare their models with each other.  相似文献   

12.
This paper presents a new Bayesian, infinite mixture model based, clustering approach, specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles, given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a non-parametric regression approach is used. Each expression profile is expanded over a set of basis functions and the sets of coefficients of each curve are subsequently modeled through a Bayesian infinite mixture of Gaussian distributions. Therefore, the task of finding clusters of genes with similar expression profiles is then reduced to the problem of grouping together genes whose coefficients are sampled from the same distribution in the mixture. Dirichlet processes prior is naturally employed in such kinds of models, since it allows one to deal automatically with the uncertainty about the number of clusters. The posterior inference is carried out by a split and merge MCMC sampling scheme which integrates out parameters of the component distributions and updates only the latent vector of the cluster membership. The final configuration is obtained via the maximum a posteriori estimator. The performance of the method is studied using synthetic and real microarray data and is compared with the performances of competitive techniques.  相似文献   

13.
The probabilistic uncertainty in record linkage affects statistical analysis such as regression analysis of linked data. This paper considers Bayesian regression analysis with linked data and shows that despite using the usual normal regression analysis, the least squares type estimators of regression coefficients are not always adequate. A method is proposed in which the distribution of the response variable is used. This method is related to finite mixture analysis and leads to more accurate estimations. A simple approach has been proposed to increase the tractability and reduce the number of mixture components. A Monte Carlo simulation study is also performed to assess the proposed approach.  相似文献   

14.
We introduce the concept of snipping, complementing that of trimming, in robust cluster analysis. An observation is snipped when some of its dimensions are discarded, but the remaining are used for clustering and estimation. Snipped k-means is performed through a probabilistic optimization algorithm which is guaranteed to converge to the global optimum. We show global robustness properties of our snipped k-means procedure. Simulations and a real data application to optical recognition of handwritten digits are used to illustrate and compare the approach.  相似文献   

15.
离散型模糊概率在方案选择中的应用   总被引:3,自引:1,他引:2  
针对决策者在进行方案选择时各种自然状态发生概率的不确定性,提出并讨论了离散型模糊状态概率下的方案选择问题。将方案各状态发生概率的三项估计值表示为三角模糊数,从而获得离散型模糊概率分布,利用限制模糊算法确定离散型模糊概率分布的数字特征值,在确定决策者决策准则的前提下,利用方案评价指标的数字特征值选择方案,同时分析了该法处理的优点。  相似文献   

16.
Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Comparing the clustering methods for the real data set confirmed the findings of the simulation. This study yields concrete suggestions to future researchers to determine the best method for clustering their functional data.  相似文献   

17.
Recently, Gupta and Kundu [R.D. Gupta and D. Kundu, A new class of weighted exponential distributions, Statistics 43 (2009), pp. 621–634] have introduced a new class of weighted exponential (WE) distributions, and this can be used quite effectively to model lifetime data. In this paper, we introduce a new class of weighted Marshall–Olkin bivariate exponential distributions. This new singular distribution has univariate WE marginals. We study different properties of the proposed model. There are four parameters in this model and the maximum-likelihood estimators (MLEs) of the unknown parameters cannot be obtained in explicit forms. We need to solve a four-dimensional optimization problem to compute the MLEs. One data set has been analysed for illustrative purposes and finally we propose some generalization of the proposed model.  相似文献   

18.
Model-based clustering of Gaussian copulas for mixed data   总被引:1,自引:0,他引:1  
Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.  相似文献   

19.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

20.
ABSTRACT

This paper introduces a generalization of the negative binomial (NB) distribution in analogy with the COM-Poisson distribution. Many well-known distributions are particular and limiting distributions. The proposed distribution belongs to the modified power series, generalized hypergeometric and exponential families, and also arises as weighted NB and COM-Poisson distributions. Probability and moment recurrence formulae, and probabilistic and reliability properties have been derived. With the flexibility to model under-, equi- and over-dispersion, and its various interesting properties, this NB generalization will be a useful model for count data. An application to empirical modeling is illustrated with a real data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号