HKUST-KnowComp/WinoWhy

Question about WSC and WNLI

Opened this issue · 2 comments

Excuses me, it confused me that the original code in this repository seems to be totally for the WNLI task, do I need to do some modification to reproduce the result in the ACL2020 WinoWhy paper? Or just use this code to reproduce? Or have you forgotten to upload the code for WSC setting?

I have the same question

Thank you very much for the comments. My apology for the late reply. As said in Section 5.1 of our paper, the results of the unsupervised setting require extracting model prediction scores from the involved models. The models can be found at WinoGrande Repo and BERT-WSCR Repo. You can reproduce the results by using these models on the provided dataset.

Also, please allow me to share an easier way to do this with the BIG-bench version of WinoWhy. You can then reproduce the results with the bigbench-style evaluation pipeline and find details with their experiments on evaluating large-scale models with WinoWhy. Models such as PaLM still struggle on our dataset with less than 60% accuracy, while the best rater can achieve 85% accuracy.