This repository designing a sina weibo crawler is dedicated to the research program of IWCT,SJTU
=======
#Requirements:
- Scrapy >= 0.14
- redis-py (tested on 2.4.9)
- redis server (tested on 2.4-2.6)
- BeautifulSoup
- pymongo
$ sudo apt-get install redis-server
$ sudo pip install requirements.txt
-
微博模拟登录
-
分布式/多线程抓取框架
-
抓取任务接口(用户资料/朋友网/微博内容等)
-
页面内容解析
-
数据存储(Redis/MongoDB)
-
WEIBO Login Simulator
-
Distributed/Multi-Threading Extraction Framework
-
Extraction Task Interface(user profile/social network/weibos etc.)
-
Weibo Page Parser
-
Data Storage(Redis/MongoDB)
- run command **$ scrapy crawl weibospider ** on your console
- under current directory