this scraper has been included into https://github.com/okfde/dokukratie/ and is maintained there
A simple memorious extension to download documents from the Wissenschaftliche Dienste des Deutschen Bundestags
Other than the name suggests, it's not technical based on https://sehrgutachten.de but scrapes the website of the bundestag directly.
It downloads the files and metadata into a local folder.
The startdate
and enddate
parameters need to be set via env vars:
STARTDATE=2021-05-01 ENDDATE=`date '+%Y-%m-%d'` memorious run sehrgutachten
if running locally, make sure the memorious config env is set as well:
MEMORIOUS_CONFIG_PATH=src
git clone https://github.com/simonwoerpel/memorious-sehrgutachten.git
cd memorious-sehrgutachten
pip install -e .
All the magic happens in src/sehrgutachten.py
and src/sehrgutachten.yml
To use the scraper for a production basis, a proper redis and psql should be used.
Please refer to the official documentation of memorious