EthanXue301/dianping_spider
Using this framework-free spider I crawled 123GB html page from http://www.dianping.com/ and extracted 144 million comment items from it. Then aggregate with restaurant. In the end we only need to process 20 thousand items. This spider was conducted during May 2017.
PythonMIT
Watchers
No one’s watching this repository yet.