Data mining tools for Safeway app.
Make sure you have Python 3.9+ and Poetry installed:
pip install poetry
Then, clone and init the project:
git clone git@github.com:littlepea/safeway-data.git
cd safeway-data
poetry install
In order to access Google Sheets, you'll need to prepare some secrets such as DEVELOPER_KEY
.
You can place them in config/.env
file menually or ask @littlepea to provide you a file with secrets.
If you want to fill out the secrets manually you can start from this template:
cp config/.env.example config/.env
Check the --help
for the CLI:
❯ poetry run python main.py --help
Usage: main.py [OPTIONS]
Options:
--dry-run / --no-dry-run [default: no-dry-run]
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
Run the actual conversion script:
❯ poetry run python main.py
Loading records from spreadsheet 1Y1QLbJ6gvPvz8UI-TTIUUWv5bDpSNeUVY3h-7OV6tj0
Saved 272 results into data/output.example.csv
If you have any questions, contact @littlepea
❯ poetry run pytest
Test session starts (platform: darwin, Python 3.9.12, pytest 7.1.2, pytest-sugar 0.9.4)
collecting ...
tests/test_spreadsheet_adapter.py ✓ 50% █████
tests/test_convert_data.py ✓ 100% ██████████
Results (0.34s):
2 passed
All the Scrapy spiders are in the scraping
directory.
You can run a specific spider by supplying the name and output file:
poetry run scrapy crawl dopomoga -o data/dopomoga.csv
You can place your new spiders into scraping/spiders
directory and implement according
to the Scrapy tutorial.
It's highly recommended to add unit tests for your spider's parse
method.