Westac Project, 2020-2021
The full data set consists of multiple parts:
- Riksdagens protokoll between from 1920 until today in the Parla-clarin format
- Comprehensive list of MPs and cabinet members during this period
- Traceable logs of all curation and segmentation as a git history
- Documentation of the corpus and the curation process
- A Google Colab notebook that demonstrates how the dataset can be used with Python
A full dataset is available under this download link. It has the following structure
- Annual protocol files in the
corpus/
folder - List of MPs
corpus/members_of_parliament.csv
- List of ministers
corpus/ministers.csv
- List of speakers of the house
corpus/talman.csv
The workflow to use the data is demonstrated in this Google Colab notebook.
The corpora are large and automatically curated and segmented. If you find any errors, it is possible to submit corrections to them. This is documented in the project wiki.