简单分布式爬虫项目,该项目,分布式采用简单的主从模式,采用分布式进程和进程间的通信,同时,涵盖了普通爬虫应有的几个模块,URL管理模块,Html解析模块,Html下载模块,数据存储模块,爬虫调度模块
This is a demo for crawling the website 'http://fund.eastmoney.com/fund.html' at this demo you can learn how to use the selenium,beautifulsoup,sqlacheme,process,and manager modules
the robot for the douban comment
the crawler for the website http://www.jameshardie.co.nz/specifiers/cad-library
the crawler for the app api
the auto crawler for dingding data
今日头条整站数据
淘宝商品大家问的评论数据
阿里试用报告的用户评分及其他数据
稍微改造可以抓取整站需要抓取的交易记录