/DRASTIC

Dataset and scripts for the DRASTIC corpus of DRS-annotated texts

Primary LanguagePythonCreative Commons Attribution 4.0 InternationalCC-BY-4.0

The DRASTIC corpus

CC BY 4.0

Dataset and scripts for the DRASTIC corpus of DRS-annotated texts.

The repository has the following structure:

├── data
│   ├── drs-annotation
│   │   ├── anaphora-resolution
│   │   │   ├── dvorak
│   │   │   ├── marbles
│   │   │   ├── nida
│   │   │   └── short-texts
│   │   └── no-anaphora-resolution
│   │       ├── dvorak
│   │       ├── marbles
│   │       ├── nida
│   │       └── short-texts
│   └── ud-sources
│       ├── dvorak
│       ├── marbles
│       ├── nida
│       └── short-texts
└── scripts

data contains the drs-annotation, in a clausal format, as well as the corresponding ud-sources from the GUM corpus. The semantic annotations are given in two versions: one with sentence-internal anaphora resolved (anaphora-resolution) and one without (no-anaphora-resolution). Within each directory, the texts are divided by sub-corpus, and named for their corresponding UD sent_ids (for details, see Haug et al. 2023, referenced below).

scripts contains a script (flatten_clause_notation.py) which will 'flatten' PMB-style DRSs into our simplified format. We also provide a shell script (flatten_clause_notation_in_batch.sh) to run this on multiple files at once.

If you use this data, please cite the following paper:

Haug, Dag T. T., Jamie Y. Findlay and Ahmet Yıldırım. 2023. The long and the short of it: DRASTIC, a semantically annotated dataset containing sentences of more natural length. In Proceedings of the 4th International Workshop on Designing Meaning Representations (DMR 2023), 89–98. Association for Computational Linguistics.

  @inproceedings{haug_etal:drastic,
    title           = {The long and the short of it: \textsc{drastic}, a semantically annotated dataset containing sentences of more natural length},
    year            = {2023},
    author          = {Dag T. T. Haug and Jamie Y. Findlay and Ahmet Y\i{}ld\i{}r\i{}m},
    booktitle       = {{Proceedings of the 4th International Workshop on Designing Meaning Representations (DMR 2023)}},
    pages           = {89--98},
    publisher       = {Association for Computational Linguistics},
    url             = {https://aclanthology.org/2023.dmr-1.9}
  }

This data is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0