
Hyperparameter values

Opened this issue · 0 comments

"For MBERT and XLM-R we searched the following hyper-parameter grid in both SIQA and COPA training:
learning rate ∈ {5 · 10−6 , 10−5 , 3 · 10−5 },
dropout rate (applied to the output layer of the transformer and the hidden layer of the feed-forward scoring net) ∈ {0, 0.1}, 
and batch size ∈ {4, 8}"

We need to reproduce the zero-shot numbers with XLMR.
Could you please share the exact values of the learning rate, dropout rate, and batch size that correspond to the numbers in your paper?