downloading datasets
Closed this issue ยท 4 comments
Hello,
Thanks for making your code available and your work is very clear and clean.
I was able to run the colab file demo on my end
I'm trying to reproduce the datasets through colab, I'm receiving the following error and similar errors when trying to download, any advice on this?
Connecting to acl-arc.comp.nus.edu.sg (acl-arc.comp.nus.edu.sg)|137.132.84.180|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-11-21 19:55:07 ERROR 403: Forbidden.```
That's the ACL Anthology Reference Corpus (ACL ARC) and apparently the original source is offline. See
https://catalog.ldc.upenn.edu/docs/LDC2009T29/lrec_08/
Maybe contact the authors of the ACL ARC. In the meantime, I'll try to upload the dataset somewhere else.
If you do not want to recreate our dataset but just want to reproduce our experiments, running this should be sufficient:
from nlp import load_dataset
# Training data for first CV split
train_dataset = load_dataset(
'./datasets/cord19_docrel/cord19_docrel.py',
name='relations',
split='fold_1_train'
)
I see, thanks!
I found this DATA_URL = "http://datasets.fiq.de/acl_docrel.tar.gz"
this should include the main ACL ARC corpus you are using, right?
It's not the full ACL ARC but all paper data needed for training the models.