yxuansu/TaCL

Why is the reproduction result on English benchmark lower than that in the paper?

wpwpwpyo opened this issue · 2 comments

Why is the reproduction result on English benchmark lower than that in the paper?Especially CoLA,STS-B and QQP.Cloud you please show the parameter configuration of sh files of fine-tuning GLUE and SQuAD?

Why is the reproduction result on English benchmark lower than that in the paper?Especially CoLA,STS-B and QQP.Cloud you please show the parameter configuration of sh files of fine-tuning GLUE and SQuAD?

Hi,

Thank you for your interest in our paper. For GLUE and SQuAD, We use the default scripts provided by the huggingface team here (https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification) and here (https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering). It should be noted that different machines or hardware configurations could lead to slightly variant results. For your reference, the configuration of our machine for the English benchmarks is:
(1) Ubuntu 16.04.4 LTS; (2) NVIDIA-SMI 430.26; (3) Driver Version: 430.26; (4) CUDA Version: 10.2; (5) GeForce GTX 1080 (12GB);

Feel free to ask if you have further questions :)

Why is the reproduction result on English benchmark lower than that in the paper?Especially CoLA,STS-B and QQP.Cloud you please show the parameter configuration of sh files of fine-tuning GLUE and SQuAD?

SQuAD 1.1:
squad-1 0

SQuAD 2.0:
squad-2 0

We have rerun the experiments on SQuAD, please see the results as a reference.