eladsegal/tag-based-multi-span-extraction

83.1 F1 on DROP

Closed this issue · 2 comments

Hi Elad, thanks for providing a well-organized repository. I am trying to reproduce your TASE+IO+SSE result on DROP with RoBERTA-large, and I'm getting 83.1 F1 on the dev set. Besides running the training command directly, are there any other steps I need to reproduce your 87.8 result? (e.g., data pre-processing or something)

Hi Sanjay, thank you for the compliment and your interest!

The result you got is good, I'll clarify:

  • The model was trained on the entire DROP training set, however the results in Table 1 are just for span questions in the dev set. Specifically, the 87.8 result is for "all spans" (single-span and multi-span questions) which excludes number and date questions (as they are mostly not affected by the used span extraction architecture).
  • Over the full dev set (including number and date questions) we got 83.59 F1, so your result isn't too far and within the range of difference by random seed choice.
    You can change the random seeds by adding the following fields in the top level object of the used configuration file.
    {
    "numpy_seed": 42,
    "pytorch_seed": 42,
    "random_seed": 42,
    }
    

I've noticed that I didn't included an explanation for the name of the metrics in the metrics output, so I'll also provide one now:
validation_f1 (without a suffix) is the result for evaluation on all of the questions in the dev set. Names with suffixes represent evaluation per question type, model head, or both (in the form of {question_type}_{head}).
Same goes for em metrics.

Question types are:

  • span - single-span questions
  • spans - multi-span questions (doesn't contain single-span questions)
  • all spans - span + spans
  • number
  • date

Heads are:

  • arithmetic
  • count
  • passage_span - SSE for passage
  • question_span - SSE for question
  • multi_span - TASE

* Output for all spans per head is not included in the metrics output, but evaluation for any combination of question types and heads can be obtained in DROP Explorer.

Hi Elad, (sorry for the delayed response) thanks for your thorough response! It's very helpful.