Discrepancies with the original CoLA dataset
davidefiocco opened this issue · 2 comments
davidefiocco commented
Hi, I noticed that there may be a minor problem with the CoLA dataset.
By downloading data with the command
python download_glue_data.py --data_dir glue_data --tasks CoLA
I see that line 19 of dev.tsv
reads
"bc01 1 He could not] have been working."
and line 6998 of train.tsv
reads
"sgww85 1 I consider that a rude remark and in very [NP and PP] bad taste."
The square brackets are not to be found in the original CoLA dataset https://nyu-mll.github.io/CoLA/
I am not sure of what may be the source of the discrepancy.
davidefiocco commented
Aw, it seems that these are in the original too somehow, apologies!
alexwarstadt commented
Hi Davide, Thanks for pointing out the error. I will be releasing an updated version of CoLA with some corrections.