/xiaomi

bittiger webcrawler project

Primary LanguagePython

A Web Crawler (Scrapy + Splash + MongoDB)

Project description:

This is a project to crawl the names of APPs on www.mi.com. The detailed description can be found here.

Basically the final code will crawl the http://app.mi.com/, grab all the APPs' name and store them into a MongoDB database.

I delevoped these codes based on two useful tutorials, (here and here)however, both of them have some obsolete codes in there so I had some modifications.

The master brand codes finish the first two steps of this project: crawl the main page and store the APPs' names into a MongoDB database.

The final step is to use Splash to crawl some linked pages, not only the main page. I am actively working on this final step now.

Watch this repo if you are interested about the progress!