NOTE: This is a scraping project which uses pythons open source Scrapy Architecture
Currently supports:
- Woolworths Australia https://www.woolworths.com.au/
- Python 3.8+
With pip + venv:
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install --upgrade pip
$ pip install -r requirements.txt
If your relative imports aren't working, create `.pth' and add the parent folder/s to the file:
$ echo $(pwd) >> .venv/lib/python3.8/site-packages/my_p_ext.pth
Check your scrapy project is active. Run the scrapy
command from the inside the project Grocer folder:
$ scrapy
Scrapy 2.5.0 - project: grocer ...
Scrapy Docs are really comprehensive if you're interested in learning.
To connect the output data to a database, install postgres then add a filename called grocer.ini
with database details
[DATABASE]
drivername = postgresql
host = localhost
port = 5432
username = <user>
password = postgres
database = <database>
Scrapy spiders are located in
This project uses Scrapy for webscraping and SQLAlchemy to store the data.
Possibly will use Alembic for data migrations in the future.
Overview of scrapy architecture: