基于Hadoop的网络爬虫技术研究 Research on Web Crawler Technology Based on Hadoop期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于Hadoop的网络爬虫技术研究

引用本文：	王艳红,周军.基于Hadoop的网络爬虫技术研究[J].吉林工程技术师范学院学报,2014,30(8):87-89.

作者姓名：	王艳红周军

作者单位：	南通航运职业技术学院管理信息系,江苏南通,226010

摘要：	网络爬虫一般从一个起始网页开始,读取网页的内容和网页中的链接,依次循环下去,直到找到此网页所有的链接网页为止;当要爬取的数据量比较大时,传统的技术存在一定弊端,而Hadoop开源云计算框架在数据采集方面会有一定的优势。在介绍Hadoop云计算框架的基础上,本文阐述网络爬虫的原理,并实现基于Hadoop的网络爬虫。
关键词：	Hadoop 网络爬虫 MapReduce 搜索引擎
Research on Web Crawler Technology Based on Hadoop

WANG Yan-hong,ZHOU Jun.Research on Web Crawler Technology Based on Hadoop[J].Journal of Jilin Teachers Institute of Engineering and Technology(Natural Sciences Edition),2014,30(8):87-89.

Authors:	WANG Yan-hong ZHOU Jun

Institution:	(Management Information Department, Nantong Shipping College, Nantong Jiangsu 226010, China)

Abstract:	The Web crawler usually starts from a starting Webpage, reads the content of webpage and Webpage links, successively circles until it finds all the webpage links; when you want to climb from the large amount of data, the traditional technology has some disadvantages, and the Hadoop open source cloud computing framework will have a certain advantages in data acquisition. On the basis of the introduction of Hadoop cloud computing framework, this paper describes the principle of the web crawler and realization of the web crawler based on Hadoop.

Keywords:	Hadoop web crawler MapReduce search engine
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏