Scrapper for http://sgtalk.org/. Just run and sit back. Look at the following file for data being scrapped.
- Mongo DB
- Python
- Setting up the python virtual env and installing the requirements.
Create the virtual env (one time process), virtualenv is in gitignore hence you have to create one on your local machine
virtualenv --no-site-packages env
Activate it:
source env/bin/activate
Install the requirements:
pip install -r requirements.txt
- Unique database and indexes in mongo
use sgtalk
db.posts.createIndex( {"post.post_url" : 1 }, {"unique": true })
The script stores the following data in mongo db.
scrapy crawl sgtalk
This project is licensed under the MIT License.