83.1 F1 on DROP

Question

83.1 F1 on DROP

Closed this issue 3 years ago · 2 comments

Hi Elad, thanks for providing a well-organized repository. I am trying to reproduce your TASE+IO+SSE result on DROP with RoBERTA-large, and I'm getting 83.1 F1 on the dev set. Besides running the training command directly, are there any other steps I need to reproduce your 87.8 result? (e.g., data pre-processing or something)

Answer 1 · 2021-06-03T09:13:33.000Z

Hi Sanjay, thank you for the compliment and your interest!

The result you got is good, I'll clarify:

The model was trained on the entire DROP training set, however the results in Table 1 are just for span questions in the dev set. Specifically, the 87.8 result is for "all spans" (single-span and multi-span questions) which excludes number and date questions (as they are mostly not affected by the used span extraction architecture).
Over the full dev set (including number and date questions) we got 83.59 F1, so your result isn't too far and within the range of difference by random seed choice.
You can change the random seeds by adding the following fields in the top level object of the used configuration file.
```
{
"numpy_seed": 42,
"pytorch_seed": 42,
"random_seed": 42,
}
```

I've noticed that I didn't included an explanation for the name of the metrics in the metrics output, so I'll also provide one now:
validation_f1 (without a suffix) is the result for evaluation on all of the questions in the dev set. Names with suffixes represent evaluation per question type, model head, or both (in the form of {question_type}_{head}).
Same goes for em metrics.

Question types are:

span - single-span questions
spans - multi-span questions (doesn't contain single-span questions)
all spans - span + spans
number
date

Heads are:

arithmetic
count
passage_span - SSE for passage
question_span - SSE for question
multi_span - TASE

* Output for all spans per head is not included in the metrics output, but evaluation for any combination of question types and heads can be obtained in DROP Explorer.

Answer 2 · 2021-06-08T18:59:21.000Z

Hi Elad, (sorry for the delayed response) thanks for your thorough response! It's very helpful.