Riksdagen OCR

This repo contains machinery to OCR scanned documents related to the riksdagen corpus

There is also some code for processing this data. Namely

  • MPs are scraped from person lists
  • MPs are scraped from statscalender