首页 | 本学科首页   官方微博 | 高级检索  
     


Profiles identification on hierarchical tree structure data sets
Authors:Conceição Rocha  Pedro Quelhas Brito
Affiliation:1. LIAAD/INESC TEC, Porto, Portugal;2. FEP, Porto, Portugal
Abstract:In this work we study a way to explore and extract more information from data sets with a hierarchical tree structure. We propose that any statistical study on this type of data should be made by group, after clustering. In this sense, the most adequate approach is to use the Mahalanobis–Wasserstein distance as a measure of similarity between the cases, to carry out clustering or unsupervised classification. This methodology allows for the clustering of cases, as well as the identification of their profiles, based on the distribution of all the variables that characterises each subject associated with each case. An application to a set of teenagers' interviews regarding their habits of communication is described. The interviewees answered several questions about the kind of contacts they had on their phone, Facebook, email or messenger as well as the frequency of communication between them. The results indicate that the methodology is adequate to cluster this kind of data sets, since it allows us to identify and characterise different profiles from the data. We compare the results obtained with this methodology with the ones obtained using the entire database, and we conclude that they may lead to different findings.
Keywords:Hierarchical tree structure  clustering  profile  Mahalanobis–Wasserstein distance  multivariate empirical distributions  quantiles
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号