Why is the reproduction result on English benchmark lower than that in the paper?
wpwpwpyo opened this issue · 2 comments
Why is the reproduction result on English benchmark lower than that in the paper?Especially CoLA,STS-B and QQP.Cloud you please show the parameter configuration of sh files of fine-tuning GLUE and SQuAD?
Why is the reproduction result on English benchmark lower than that in the paper?Especially CoLA,STS-B and QQP.Cloud you please show the parameter configuration of sh files of fine-tuning GLUE and SQuAD?
Hi,
Thank you for your interest in our paper. For GLUE and SQuAD, We use the default scripts provided by the huggingface team here (https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification) and here (https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering). It should be noted that different machines or hardware configurations could lead to slightly variant results. For your reference, the configuration of our machine for the English benchmarks is:
(1) Ubuntu 16.04.4 LTS; (2) NVIDIA-SMI 430.26; (3) Driver Version: 430.26; (4) CUDA Version: 10.2; (5) GeForce GTX 1080 (12GB);
Feel free to ask if you have further questions :)
Why is the reproduction result on English benchmark lower than that in the paper?Especially CoLA,STS-B and QQP.Cloud you please show the parameter configuration of sh files of fine-tuning GLUE and SQuAD?
We have rerun the experiments on SQuAD, please see the results as a reference.