Xinghaoz/first-crawler

This is one of my side project and the first crawler I will write.

Python

first-crawler

My first crawler for crawling the fashion items in "http://www.mogujie.com/" and "http://www.meilishuo.com/".

Features

Language: Python
Using Scrapy + Splash
Docker
MongoDB

TODO

USAGE

For running the crawler:

scrapy crawl first_crawler

For running the quantity checker:

scrapy crawl quantity_checker

Connect your shell to the default machine.

eval "$(docker-machine env default)"

Before running the crawler we need to set up Splash in Docker:

docker run -p 8050:8050 scrapinghub/splash

You will then need to set the SPLASH_URL setting in your project’s settings.py:

SPLASH_URL = 'http://localhost:8050/'

Don’t forget, if you are using boot2docker on OS X, you will need to set this to the IP address of the boot2docker virtual machine, e.g.:

docker-machine ip If it shows:
192.168.99.100

Then in settings.py:

SPLASH_URL = 'http://192.168.99.100:8050/'

Install dependencies

Install MongoDB API of python:

python -m pip install pymongo

Setting MongoDB path in Mac:

'mongod --dbpath ~/Developer/MongoDB/data/db'