prajjwal1/generalize_lm_nli

getting low results on MNLI

Closed this issue · 1 comments

Hi

I have tried to use the bert-small-mnli model hosted on HF, and seems I have a couple of problems:

  1. The tokenizer doesn't seem to have a mxlen provided, so no padding happens AFAICT. Is that correct?
  2. There's a difference in the results I get from the HF API and when I locally run the inference code. Which or all of these should I have in the batch? input_ids, token_type_ids and attention_mask? If I just use input_ids, the results seem to match with the HF API (there's still a small difference, but I am tokenizing a bit differently), but otherwise, there's a huge difference.
  3. In any case, the HF API (and my local inference code) results are surprisingly low: for a sample of 100 validation_matched instances, I get an acc score of 21%. Do you have any insight on this? I am using hf datasets to download mnli data, so don't know if that is responsible for something. I saw something to that effect in the README, but quite not sure how should I change the labels in the dataset, if needed.
  1. Padding happens as per the default value of the Tokenizer i.e 128.
    For 2. and 3., I will require more information as to what you're exactly doing (such as a snippet).