/dianping_spider

Using this framework-free spider I crawled 123GB html page from http://www.dianping.com/ and extracted 144 million comment items from it. Then aggregate with restaurant. In the end we only need to process 20 thousand items. This spider was conducted during May 2017.

Primary LanguagePythonMIT LicenseMIT

Stargazers