首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
知识服务业所面对的客户需求具有知识化、专业化及问题解决为导向等特征.在服务过程中客户具有高度的参与性和交互性.文章指出客户市场细分不再局限于事前细分方法中常见的客户行为特征变量,提出了基于态度变量的客户市场事后细分策略,介绍了传统聚类方法K-means和SOFM方法在市场细分中的应用,以及支持向量机聚类方法(SVC)进行知识服务业的客户市场细分的过程.应用SVC方法于具体行业实施客户市场细分,并对比三种方法的聚类效果,说明了该方法对于提高判别分类效果的能力和优势.  相似文献   

2.
传统的K-Prototypes聚类算法是利用划分的思想来对混合数据进行聚类,但是当混合数据的维度增大时,对象之间的差异度几乎相等,使得此算法难以进行。针对上述缺陷,文章提出一种改进的K-Prototyes聚类算法,聚类前先剔除各类中不相关的维度,将高维混合数据投影降维后再进行聚类。文中给出了Heart Disease Databases的算例,验证了算法的有效性。  相似文献   

3.
文章基于类别数据集合引入质量和特征向量的概念;确定了计算类别型数据的相似度;给出聚类结果清晰度及其变化率的定义;提出一种对质量和特征向量有效聚类类别型数据的算法.  相似文献   

4.
三路数据主要包含面板数据、纵向数据和三模数据三种立体数据格式。三路数据在社会科学和自然科学研究中受到越来越广泛的关注和应用。传统的聚类分析理论和方法主要基于两路平面数据而建立,对于三路数据,这些理论和方法则显得无能为力。在对传统的两路数据聚类方法做概要回顾的基础上,对国内外主流的三路数据聚类方法做了简要综述和总结。  相似文献   

5.
刘勇  全廷伟 《统计与决策》2007,(20):146-148
本文介绍了几种常用的支持向量机多类分类方法,分析了其存在的问题及缺点。在有向无环图支持向量(DAG-SVMS)多类分类方法的基础上,提出了一种新的多类分类方法。该方法采用了最小超球体类包含作为层次分类依据。试验结果表明,采用该方法进行多类分类,跟已有的分类方法相比有更高的分类精度。  相似文献   

6.
多指标面板数据能够较全面的提供研究对象的信息和数据特征,但复杂的数据结构也给其聚类分析带来了一定的困难.针对这一问题,文章提出了基于特征提取的多指标面板数据聚类方法,该方法将能够表征面板数据动态变化的“绝对量”特征、“波动”特征、“偏度”特征、“峰度”特征及“趋势”特征引入动态聚类算法中,可以避免以往采用欧式距离进行聚类的局限性,还可以处理带有缺失数据的面板数据,同时大大提高了聚类效率,并最大限度地保证时间维度信息不受损失.利用该方法分析了2001至2013年我国不同省份道路交通事故的不平衡状况,通过实证分析表明该方法能够解决多指标面板数据聚类的问题.  相似文献   

7.
对于一类变量非线性相关的面板数据,现有的基于线性算法的面板数据聚类方法并不能准确地度量样本间的相似性,且聚类结果的可解释性低。综合考虑变量非线性相关问题及聚类结果可解释性问题,提出一种非线性面板数据的聚类方法,通过非线性核主成分算法实现对样本相似性的测度,并基于混合高斯模型进行样本概率聚类,实证表明该方法的有效性及其对聚类结果的可解释性有所提高。  相似文献   

8.
文章针对多指标面板数据的样品分类问题,从多元统计学理论角度提出一个多指标面板数据的聚类分析方法。该方法综合考虑面板数据的水平指标、增量指标和增量变化率指标的时间序列特征及其非同步时间序列问题,在重新构造了离差平方和函数基础上,提出了一种聚类方法。通过实证分析,表明新方法能够解决多指标面板数据聚类的问题,分类效果较好。  相似文献   

9.
文章研究了一种高维数据聚类特征选择方法——稀疏聚类,稀疏聚类是通过对特征变量赋予权重,并添加lasso惩罚因子,压缩权重,得到对变量的权重排序,即重要性排序,使其在进行分类预测的同时达到自动剔除冗余变量的效果,从而起到了对高维数据聚类时的特征选择作用.将此方法运用于中国环保问题,将中国31个省份根据环保情况分为3类,并从现有的104个环保指标中筛选得到20个重要指标.  相似文献   

10.
当对插补所得的“完整数据集”使用标准的完全数据统计方法的时候,往往会低估插补估计量的方差.Bootstrap方法(自助法)是非参数统计中的一种重要的统计方法,是基于原始观测数据进行重复抽样,能充分的利用已知数据,不需要对未知总体进行任何的分布假设或增加新的样本信息,进而再利用现有的统计模型对总体的分布特性进行统计推断.本文首先运用多重插补的方法对缺失数据进行了插补,之后利用Bootstrap方法对插补之后的数据进行了插补统计量的方差估计,结果表明运用Bootstrap方法进行插补统计量的方差估计更科学更准确.  相似文献   

11.
Phillips and Sweeting [J. R. Statist. Soc. B 58 (1996) 775–783.] considered estimation of the parameter of the exponential distribution with censored failure time data when there is incomplete knowledge of the censoring times. It was shown that, under particular models for the censoring mechanism and censoring errors, it will usually be safe to ignore such errors provided they are not expected to be too large. A flexible model is introduced which includes the extreme cases of no censoring errors and no information on the censoring values. The effect of alternative assumptions about knowledge of the censoring values on the estimation of failure rate is investigated.  相似文献   

12.
A two-stage estimation procedure is developed to analyze structural equation models of polytomous variables based on incomplete data. At the first stage, the partition maximum likelihood approach is used to obtain the estimates of the elements in the correlation matrix. It will be shown that the asymptotic distribution of these estimates is jointly multivariate normal. The second stage estimates the structural parameters in the correlation matrix by the generalized least squared approach with a correctly specified weight matrix. Asymptotic properties of the second stage estimates are also provided. Extension of the theory to multisample models, and some illustrative examples are also included.  相似文献   

13.
A popular approach to estimation based on incomplete data is the EM algorithm. For categorical data, this paper presents a simple expression of the observed data log-likelihood and its derivatives in terms of the complete data for a broad class of models and missing data patterns. We show that using the observed data likelihood directly is easy and has some advantages. One can gain considerable computational speed over the EM algorithm and a straightforward variance estimator is obtained for the parameter estimates. The general formulation treats a wide range of missing data problems in a uniform way. Two examples are worked out in full.  相似文献   

14.
Market segmentation is a key concept in marketing research. Identification of consumer segments helps in setting up and improving a marketing strategy. Hence, the need is to improve existing methods and to develop new segmentation methods. We introduce two new consumer indicators that can be used as segmentation basis in two-stage methods, the forces and the dfbetas. Both bases express a subject’s effect on the aggregate estimates of the parameters in a conditional logit model. Further, individual-level estimates, obtained by either estimating a conditional logit model for each individual separately with maximum likelihood or by hierarchical Bayes (HB) estimation of a mixed logit choice model, and the respondents’ raw choices are also used as segmentation basis. In the second stage of the methods the bases are classified into segments with cluster analysis or latent class models. All methods are applied to choice data because of the increasing popularity of choice experiments to analyze choice behavior. To verify whether two-stage segmentation methods can compete with a one-stage approach, a latent class choice model is estimated as well. A simulation study reveals the superiority of the two-stage method that clusters the HB estimates and the one-stage latent class choice model. Additionally, very good results are obtained for two-stage latent class cluster analysis of the choices as well as for the two-stage methods clustering the forces, the dfbetas and the choices.  相似文献   

15.
A tutorial on support vector regression   总被引:78,自引:0,他引:78  
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.  相似文献   

16.
Missing observations in both responses and covariates arise frequently in longitudinal studies. When missing data are missing not at random, inferences under the likelihood framework often require joint modelling of response and covariate processes, as well as missing data processes associated with incompleteness of responses and covariates. Specification of these four joint distributions is a nontrivial issue from the perspectives of both modelling and computation. To get around this problem, we employ pairwise likelihood formulations, which avoid the specification of third or higher order association structures. In this paper, we consider three specific missing data mechanisms which lead to further simplified pairwise likelihood (SPL) formulations. Under these missing data mechanisms, inference methods based on SPL formulations are developed. The resultant estimators are consistent, and enjoy better robustness and computation convenience. The performance is evaluated empirically though simulation studies. Longitudinal data from the National Population Health Survey and Waterloo Smoking Prevention Project are analysed to illustrate the usage of our methods.  相似文献   

17.
We consider the comparison of point processes in a discrete observation situation in which each subject is observed only at discrete time points and no history information between observation times is available. A class of non-parametric test statistics for the comparison of point processes based on this kind of data is presented and their asymptotic distributions are derived. The proposed tests are generalizations of the corresponding tests for continuous observations. Some results from a simulation study for evaluating the proposed tests are presented and an illustrative example from a clinical trial is discussed.  相似文献   

18.
Parametric model-based regression imputation is commonly applied to missing-data problems, but is sensitive to misspecification of the imputation model. Little and An (2004 Little , R. J. A. , An , H. ( 2004 ). Robust likelihood-based analysis of multivariate data with missing values . Statistica Sinica 14 : 949968 .[Web of Science ®] [Google Scholar]) proposed a semiparametric approach called penalized spline propensity prediction (PSPP), where the variable with missing values is modeled by a penalized spline (P-Spline) of the response propensity score, which is logit of the estimated probability of being missing given the observed variables. Variables other than the response propensity are included parametrically in the imputation model. However they only considered point estimation based on single imputation with PSPP. We consider here three approaches to standard errors estimation incorporating the uncertainty due to non response: (a) standard errors based on the asymptotic variance of the PSPP estimator, ignoring sampling error in estimating the response propensity; (b) standard errors based on the bootstrap method; and (c) multiple imputation-based standard errors using draws from the joint posterior predictive distribution of missing values under the PSPP model. Simulation studies suggest that the bootstrap and multiple imputation approaches yield good inferences under a range of simulation conditions, with multiple imputation showing some evidence of closer to nominal confidence interval coverage when the sample size is small.  相似文献   

19.
The support vector machine (SVM) is a popularly used classifier in applications such as pattern recognition, texture mining and image retrieval owing to its flexibility and interpretability. However, its performance deteriorates when the response classes are imbalanced. To enhance the performance of the support vector machine classifier in the imbalanced cases we investigate a new two stage method by adaptively scaling the kernel function. Based on the information obtained from the standard SVM in the first stage, we conformally rescale the kernel function in a data adaptive fashion in the second stage so that the separation between two classes can be effectively enlarged with incorporation of observation imbalance. The proposed method takes into account the location of the support vectors in the feature space, therefore is especially appealing when the response classes are imbalanced. The resulting algorithm can efficiently improve the classification accuracy, which is confirmed by intensive numerical studies as well as a real prostate cancer imaging data application.  相似文献   

20.
Hierarchical study design often occurs in many areas such as epidemiology, psychology, sociology, public health, engineering, and agriculture. This imposes correlation in data structure that needs to be account for in modelling process. In this study, a three-level mixed-effects least squares support vector regression (MLS-SVR) model is proposed to extend the standard least squares support vector regression (LS-SVR) model for handling cluster correlated data. The MLS-SVR model incorporates multiple random effects which allow handling unequal number of observations for each case at non-fixed time points (a very unbalanced situation) and correlation between subjects simultaneously. The methodology consists of a regression modelling step that is performed straightforwardly by solving a linear system. The proposed model is illustrated through numerical studies on simulated data sets and a real data example on human Brucellosis frequency. The generalization performance of the proposed MLS-SVR is evaluated by comparing to ordinary LS-SVR and some other parametric models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号