nyu-mll/GLUE-baselines

Discrepancies with the original CoLA dataset

davidefiocco opened this issue · 2 comments

Hi, I noticed that there may be a minor problem with the CoLA dataset.

By downloading data with the command
python download_glue_data.py --data_dir glue_data --tasks CoLA

I see that line 19 of dev.tsv reads

"bc01 1 He could not] have been working."

and line 6998 of train.tsvreads

"sgww85 1 I consider that a rude remark and in very [NP and PP] bad taste."

The square brackets are not to be found in the original CoLA dataset https://nyu-mll.github.io/CoLA/

I am not sure of what may be the source of the discrepancy.

Aw, it seems that these are in the original too somehow, apologies!

Hi Davide, Thanks for pointing out the error. I will be releasing an updated version of CoLA with some corrections.