This repo contains machinery to OCR scanned documents related to the riksdagen corpus
- OCR'd person lists (persorregister) which are part of the riksdagen protokoll corpus. Example of data source: https://weburn.kb.se/riks/metadata/05/21798905.html
- Statskalender
There is also some code for processing this data. Namely
- MPs are scraped from person lists
- MPs are scraped from statscalender