首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
数据挖掘与其他技术的比较   总被引:4,自引:0,他引:4       下载免费PDF全文
一、引言数据挖掘 (DataMining)是近年来数据库应用领域中相当热门的议题。数据挖掘一般是指在数据库中 ,利用各种分析方法与技术 ,将过去所累积的大量繁杂的历史数据进行分析、归纳与整合等工作 ,提取有用的信息 ,找出有意义并且对使用者有兴趣的样式 ,提供给企业管理阶层作为决策参考的依据。数据挖掘并不只是一种技术或是一套软件 ,而是一种结合数种专业技术的应用。我们对数据挖掘应有一个正确的认知 ,就是它并不是一个无所不能的魔法 ,而只是从资料中发掘出各种假设 (Hypothesis) ,它并不查证、确认这些假设 ,也不判断这些假设的价值…  相似文献   

2.
国外数据挖掘应用研究与发展分析   总被引:8,自引:0,他引:8       下载免费PDF全文
目前“数据挖掘(Data Mining,DM)”这一术语在学术界还没有一个公认的、权威的定义,但我们一般可以简单的认为数据挖掘是从海量数据中发现趋势或模式的过程。尽管有些人不愿承认数据挖掘与统计学的内在联系,但不可否认的是早期的数据挖掘的确脱胎于统计学,因此也可以说数据挖掘是利用统计学和机器学习技术创建预测行为的模型。需要强调的是数据挖掘的过程是一个“发现”的过程,而不是“发明”的过程。换句话说,数据挖掘所探寻的模式,是一种已有的、只是隐藏在数据中、暂时没有被发现的知识。世界上对数据挖掘的正式研究始于1989年8月举行…  相似文献   

3.
1相关理论1.1数据挖掘(Data M ining)技术数据挖掘是从数据当中发现趋势或模式的过程。这一过程的目标就是通过对大量数据的分类而发现新的信息。人们通过数据挖掘得到的回报就是将这些新发现的知识转变为经营上的成果,如增加销售收入或者减少销售成本。数据挖掘是最近几年才发  相似文献   

4.
数据挖掘研究和应用的现状与前景   总被引:6,自引:0,他引:6  
文章通过对有关资料的整理,对数据挖掘的功能、方法(算法)、应用领域及目前国内研究状况予以介绍,从而对数据挖掘工作感兴趣的学者及企业研究与应用数据挖掘技术提供参考。  相似文献   

5.
数据挖掘及其对统计学的挑战   总被引:10,自引:0,他引:10       下载免费PDF全文
韩明 《统计研究》2001,18(8):55-57
一、引言随着科学技术的发展 ,利用数据库技术来存储管理数据 ,利用机器学习的方法来分析数据 ,从而挖掘出大量的隐藏在数据背后的知识 ,这种思想的结合形成了现在深受人们关注的非常热门的研究领域 :数据库中的知识发现———KDD(KnowledgeDiscov eryinDatabases) ,其中 ,数据挖掘技术便是KDD中的一个最为关键的环节。1995年 ,在加拿大的蒙特利尔召开了第一届“知识发现和数据挖掘”国际学术会议 ,数据挖掘一词被很快流传开来。数据挖掘———DM(DataMining)就是从大量的、不完全的、有噪…  相似文献   

6.
数据挖掘任务之二:预测   总被引:6,自引:0,他引:6  
  相似文献   

7.
数据挖掘质量问题探讨   总被引:1,自引:0,他引:1       下载免费PDF全文
一、数据挖掘质量问题的提出随着数据库、数据仓库技术的发展以及数据库管理系统的广泛应用,人们积累的数据越来越多。在这些大量数据的背后隐藏了很多具有决策意义的信息,但是人们怎么得到这些“知识”呢?数据挖掘(DataMining)正是在这样的应用需求环境下产生并迅速发展起来的  相似文献   

8.
讨论数据挖掘时,有些实业界的朋友会说“我们早就在做数据挖掘了”,但事实上多数企业只是建置了一些数据仓库,至多也仅仅是一些OLAP应用。事实上,与“数据挖掘”相关的概念包括数据仓库、OLAP、知识发现(KDD)、统计分析等,这些概念与数据挖掘的混淆容易使人们对数据挖掘产生模糊理解从而影响实际数据挖掘工作的开展,因此本文拟对这些相关概念进行初步厘清以利于数据挖掘研究与应用的发展。数据挖掘与数据仓库企业构建数据仓库时通常会嵌入一些在线查询工具,因此使得许多人对于数据仓库(DataWarehouse)和数据挖掘(DataMining)经常发生…  相似文献   

9.
一、数据挖掘的定义 数据挖掘就是利用人工智能、统计分析以及其它建模方法,从大量不完全的、随机的数据中寻找数据之间的关系和有用的信息.数据挖掘在营销、金融等行业的重要性已经被认识,所以企业一般都建立自己的数据库即客户关系系统(CRM),这为数据挖掘的发展提供了基础.需要指出的是:数据挖掘并不仅仅是技术和算法的组合,它其实更像过程,这个过程的目的在于解决具体的问题或做具体的决策.  相似文献   

10.
随着Internet技术的飞速发展,www以其多媒体的传输及良好的交互性而倍受青睐.虽然近几年来网络速度得到了很大的提高,但是由于接入Internet的用户数量剧增以及Web服务和网络固有的延迟,使得网络越来越拥挤,用户的服务质量(QoS)得不到很好的保证.预取技术的基础是预测算法.数据挖掘是从大量的数据中采掘出隐含的、先前未知的、对决策有潜在价值的知识和规则的一种技术.我们可以根据用户访问的历史数据和当前访问的数据、利用数据挖掘技术来预测用户将来的可能行为,从而为用户预取一些Web页面.用户缓冲器中的数据可以作为数据挖掘时的历史数据.  相似文献   

11.
DNA microarrays allow for measuring expression levels of a large number of genes between different experimental conditions and/or samples. Association rule mining (ARM) methods are helpful in finding associational relationships between genes. However, classical association rule mining (CARM) algorithms extract only a subset of the associations that exist among different binary states; therefore can only infer part of the relationships on gene regulations. To resolve this problem, we developed an extended association rule mining (EARM) strategy along with a new way of the association rule definition. Compared with the CARM method, our new approach extracted more frequent genesets from a public microarray data set. The EARM method discovered some biologically interesting association rules that were not detected by CARM. Therefore, EARM provides an effective tool for exploring relationships among genes.  相似文献   

12.
文章从算法角度对关联规则的提出、演变过程和前沿研究进行了较为详细的考察,并在此基础上提出了关联规则未来研究的领域和发展趋势。文章先详细地考察了关联规则的三类典型算法,然后总结了基于复杂数据属性的关联规则算法扩展。为考察其他方面的算法扩展和介绍其他学科领域对关联规则的研究奠定了基础。  相似文献   

13.
基于聚类关联规则的缺失数据处理研究   总被引:2,自引:1,他引:2       下载免费PDF全文
 本文提出了基于聚类和关联规则的缺失数据处理新方法,通过聚类方法将含有缺失数据的数据集相近的记录归到一类,然后利用改进后的关联规则方法对各子数据集挖掘变量间的关联性,并利用这种关联性来填补缺失数据。通过实例分析,发现该方法对缺失数据处理,尤其是海量数据集具有较好的效果。  相似文献   

14.
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.  相似文献   

15.
基于关联规则挖掘的股票板块指数联动分析   总被引:7,自引:0,他引:7  
本文应用数据挖掘中的关联规则算法,对我国股票市场中板块指数的关系进行实证分析。通过采用关联规则的Apriori算法技术,可以从大量数据中挖掘出我国股票市场的板块指数之间的强关联规则,并对其进行可视化描述与评价。文中所得的关联规则可以帮助市场参与者发现股票板块轮动的规则模式,并在此基础上规避证券市场风险。  相似文献   

16.
We discuss the analysis of mark-recapture data when the aim is to quantify density dependence between survival rate and abundance. We describe an analysis for a random effects model that includes a linear relationship between abundance and survival using an errors-in-variables regression estimator with analytical adjustment for approximate bias. The analysis is illustrated using data from short-tailed shearwaters banded for 48 consecutive years at Fisher Island, Tasmania, and Hutton's shearwater banded at Kaikoura, New Zealand for nine consecutive years. The Fisher Island data provided no evidence of a density dependence relationship between abundance and survival, and confidence interval widths rule out anything but small density dependent effects. The Hutton's shearwater data were equivocal with the analysis unable to rule out anything but a very strong density dependent relationship between survival and abundance.  相似文献   

17.
In this article, I comment on a rough square-root rule for statistical significance which turns out to be equivalent to testing whether an observation lies within one standard deviation of that expected.  相似文献   

18.
We discuss the analysis of mark-recapture data when the aim is to quantify density dependence between survival rate and abundance. We describe an analysis for a random effects model that includes a linear relationship between abundance and survival using an errors-in-variables regression estimator with analytical adjustment for approximate bias. The analysis is illustrated using data from short-tailed shearwaters banded for 48 consecutive years at Fisher Island, Tasmania, and Hutton's shearwater banded at Kaikoura, New Zealand for nine consecutive years. The Fisher Island data provided no evidence of a density dependence relationship between abundance and survival, and confidence interval widths rule out anything but small density dependent effects. The Hutton's shearwater data were equivocal with the analysis unable to rule out anything but a very strong density dependent relationship between survival and abundance.  相似文献   

19.
This article utilizes stochastic ideas for reasoning about association rule mining, and provides a formal statistical view of this discipline. A simple stochastic model is proposed, based on which support and confidence are reasonable estimates for certain probabilities of the model. Statistical properties of the corresponding estimators, like moments and confidence intervals, are derived, and items and itemsets are observed for correlations. After a brief review of measures of interest of association rules, with the main focus on interestingness measures motivated by statistical principles, two new measures are described. These measures, called α- and σ-precision, respectively, rely on statistical properties of the estimators discussed before. Experimental results demonstrate the effectivity of both measures.  相似文献   

20.
The problem of classification into two univariate normal populations with a common mean is considered. Several classification rules are proposed based on efficient estimators of the common mean. Detailed numerical comparisons of probabilities of misclassifications using these rules have been carried out. It is shown that the classification rule based on the Graybill-Deal estimator of the common mean performs the best. Classification rules are also proposed for the case when variances are assumed to be ordered. Comparison of these rules with the rule based on the Graybill-Deal estimator has been done with respect to individual probabilities of misclassification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号