/EventStoryLine

Event StoryLine Corpus - annotated data, baselines and evaluation scripts, evaluation data.

Primary LanguageDMOtherNOASSERTION

EventStoryLine

DOI

This repository contains the following materials to extract storylines using the EventStoryLine Corpus (ESC):

  • annotated data in (CAT-)XML format (folder: annotated_data). To visualise the data, you have to use CAT (Content Annotation Tool: http://dh.fbk.eu/resources/cat-content-annotation-tool). Ask for an account, it's free.
  • annotated data in evaluation format, extending PLOT_LINK relations to include coreference relations (folder: evaluation_format)
  • test data (folder: evaluation_format/test)
  • Python3.* scripts for creating the evaluation format of the data, extracting baselines systems, evaluating baselines'output

Version 1.0 is available.

UPDATES

We have made available the following extensions to ESC:

  • ESC v1.2: all plot link relations annotated using crowdsourcing and evaluated using CrowdTruth. Plot relations have been annotated only for events in the same sentence.
  • ESC v1.5: plot link relations annotated by experts (i.e., all data in ESC v1.0) and crowd

ESC v1.2 is available in evaluation format only (folder: evaluation_format). Plot data in the evaluation format have been extended using coreference relations from ECB+. Complete documentation about the crowdsourcing experiments to obtain ESC v1.2 is available here.

ESC v1.5 is available in evaluation format (folder: evaluation_format) and (CAT-)XML (folder: annotated_data). Plot data in the evaluation format have been extended using coreference relations from ECB+. PLOT_LINK relations in (CAT-)XML have been enriched with two new attributes origin (values: crowd|experts) and validated (values:TRUE|FALSE). The validated attribute signals if the PLOT_LINK has been validate or not.

As for experiments, evaluation file format are stored in /evaluation_format/test:

  1. if you want to evaluate only against the experts: use version 1.0
  2. if you want to evaluate only against the crowd: use version 1.2
  3. if you want to evaluate against experts and crowd: use version 1.5

References

@inproceedings{caselli-vossen-2016-storyline,
    title = "The Storyline Annotation and Representation Scheme ({S}ta{R}): A Proposal",
    author = "Caselli, Tommaso  and
      Vossen, Piek",
    booktitle = "Proceedings of the 2nd Workshop on Computing News Storylines ({CNS} 2016)",
    month = nov,
    year = "2016",
    address = "Austin, Texas",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W16-5708",
    doi = "10.18653/v1/W16-5708",
    pages = "67--72",
}
@inproceedings{caselli-vossen-2017-event,
    title = "The Event {S}tory{L}ine Corpus: A New Benchmark for Causal and Temporal Relation Extraction",
    author = "Caselli, Tommaso  and
      Vossen, Piek",
    booktitle = "Proceedings of the Events and Stories in the News Workshop",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W17-2711",
    doi = "10.18653/v1/W17-2711",
    pages = "77--86",
}
@inproceedings{caselli2018crowdsourcing,
  title={Crowdsourcing StoryLines: Harnessing the Crowd for Causal Relation Annotation},
  author={Caselli, Tommaso and Inel, Oana},
  booktitle={Proceedings of the Workshop Events and Stories in the News 2018},
  pages={44--54},
  year={2018}
}