Easy to run example of the QASRL state machine from Large-Scale QA-SRL Parsing.
Based on the script from Controlled Crowdsourcing for High-Quality QA-SRL Annotation.
-
docker run -it hirscheran/qasrl_state_machine_example "predict" "I really wanted to eat my ice cream before it melted , but I was busy presenting my poster ." 4 "what is someone eating?"
The number 4 indicates the predicate index in the sentence (counting from zero).
Alternatively, You can state the verb explicitly , e.g. for nominalizations:
docker run -it hirscheran/qasrl_state_machine_example "predict" "I really wanted to eat my ice cream before it melted , but I was busy with the presentation of my poster ." "present" "what is someone presenting?"
Example input_file.csv
and input_sentences_file.csv
can be found in the data
directory.
docker run -it -v "$(pwd)/data/:/data" --rm --name qasrl hirscheran/qasrl_state_machine_example "file" "/data/input_file.csv" "/data/input_sentences_file.csv" "/data/output_file.csv"
If you decide to use files from a different directory, you also need to change the volume command.
Alternatively, if input_file.csv
includes a sentence
column with the sentence string (as in QANom data files), you can omit input_sentences_file.csv
:
docker run -it -v "$(pwd)/data/:/data" --rm --name qasrl hirscheran/qasrl_state_machine_example "file" "/data/input_qanom_file.csv" "/data/output_qanom_file.csv"
Run the state machine on all subdirectories (e.g all checkpoints), each one with a pred_file.csv
file. The script will save a file called pred_result.csv
for each pred_file
.
bash run_all.sh [path_to_dir]
See original project or the Dockerfile for development instructions.
Build docker image: docker build -t hirscheran/qasrl_state_machine_example .
Push docker image: docker push hirscheran/qasrl_state_machine_example
@article{
DBLP:journals/corr/abs-1805-05377,
author = {Nicholas FitzGerald and
Julian Michael and
Luheng He and
Luke Zettlemoyer},
title = {Large-Scale {QA-SRL} Parsing},
journal = {CoRR},
volume = {abs/1805.05377},
year = {2018},
url = {http://arxiv.org/abs/1805.05377},
archivePrefix = {arXiv},
eprint = {1805.05377},
timestamp = {Mon, 13 Aug 2018 16:47:46 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1805-05377.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{roit-etal-2020-controlled,
title = "Controlled Crowdsourcing for High-Quality {QA}-{SRL} Annotation",
author = "Roit, Paul and
Klein, Ayal and
Stepanov, Daniela and
Mamou, Jonathan and
Michael, Julian and
Stanovsky, Gabriel and
Zettlemoyer, Luke and
Dagan, Ido",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.acl-main.626",
doi = "10.18653/v1/2020.acl-main.626",
pages = "7008--7013",
abstract = "Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen. Recently, a large-scale crowdsourced QA-SRL corpus and a trained parser were released. Trying to replicate the QA-SRL annotation for new texts, we found that the resulting annotations were lacking in quality, particularly in coverage, making them insufficient for further research and evaluation. In this paper, we present an improved crowdsourcing protocol for complex semantic annotation, involving worker selection and training, and a data consolidation phase. Applying this protocol to QA-SRL yielded high-quality annotation with drastically higher coverage, producing a new gold evaluation dataset. We believe that our annotation protocol and gold standard will facilitate future replicable research of natural semantic annotations.",
}