Three data sets of Swedish text annotated for the presence of causality. The sets are annotated with two different tasks in mind, namely causality recognition and causality ranking with respect to a query prompt containing at least a cause or an effect.
Both the binary trial and curated binary sets focus on the causality as a binary recognition problem on sentence level. The curated ranking data set compares two target sentences on a 6 point scale as to whether they express a causal relation that matches the given prompt.
All text is taken from Swedish Government Official Reports, specifically, the sentences were extracted from the SOU-corpus. More detailed descriptions of how the data sets where created can be found in:
Luise Dürlich, Sebastian Reimann, Gustav Finnveden, Joakim Nivre and Sara Stymne. Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish. In Proceedings of the First Workshop on Natural Language Processing for Political Sciences. June 24, 2022. Marseilles, France.
A description of the format and annotation scheme of each set is given in the respective directories.
This work is licensed under a Creative Commons Attribution 4.0 International License.