sgtalk-scrapper

Scrapper for http://sgtalk.org/. Just run and sit back. Look at the following file for data being scrapped.

Create the virtual env (one time process), virtualenv is in gitignore hence you have to create one on your local machine

virtualenv --no-site-packages env

Activate it:

source env/bin/activate

Install the requirements:

pip install -r requirements.txt

use sgtalk
db.posts.createIndex( {"post.post_url" : 1 }, {"unique": true })

The script stores the following data in mongo db.

scrapy crawl sgtalk

License

This project is licensed under the MIT License.