nyu-mll/GLUE-baselines

Adding GLUE to PyTorch

PattynR opened this issue · 1 comments

Hi, I am currently adding some files into the PyTorch project that would enable it to directly import the GLUE datasets. I am however facing a problem regarding the QQP and SNLI datasets. There are some lines where there are too much tabs according to the number of columns that are mentioned in the first lines of those files. For example in the train.tsv file of QQP, line 97.931 is :

"\tWas Muhammad a real historical figure? What is the evidence for his existence?\t0

So in that line are supposed to be 3 columns while in the file there should 6 columns.
How should I handle those lines?

Thank you.

Hi P,

We have some notes on this issue here: https://groups.google.com/forum/#!topic/glue-benchmark-discuss/J5p3oTpqogY

Also, for a reference implementation of GLUE data loading/prediction writing, I'd look at jiant rather than this codebase: https://github.com/jsalt18-sentence-repl/jiant