/scrape-klimaschutz-planer-reports

Scrape reports from https://www.klimaschutz-planer.de/. Inspired by https://github.com/derhuerst/scrape-klimaschutz-planer-reports

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

scrape-klimaschutz-planer-reports

Scrape reports from https://www.klimaschutz-planer.de/. Inspired by https://github.com/derhuerst/scrape-klimaschutz-planer-reports

Very initial draft, however capable to extract data

Uses Klimaschutz-Planer API to retrieve files. API documentation is unknown.

Approach

Read city description from website. The important stuff is loaded with the map data. Experience shows we can get the JSON data like so:

curl 'https://www.klimaschutz-planer.de/ajax.php?action=thgMap&onlyShowCached=1' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0' -H 'Accept: /' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' -H 'Accept-Encoding: gzip, deflate, br' -H 'X-Requested-With: XMLHttpRequest' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Referer: https://www.klimaschutz-planer.de/' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-origin' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'TE: trailers' > cities.json

This might change in the future ...

We store the data in cities.json for further processing.

For all entries in cities.json:

  • check if valid on year, report and einwohner
  • Extract basic infos
  • Setup URL to API for city key (communeKey) => HTML
  • Call headless Chromium to render and save HTMT
  • Search for tables with beautiful soup
  • Extract tables => file
  • Write summary file results.json