allenai/unifiedqa

Not exactly reproducing the example with huggingface

wenlongzhao094 opened this issue · 7 comments

I followed the example in README to load UnifiedQA with huggingface, but there seems to be some version or architectural problems (Code attached at the end).
(1) Some weights of the checkpoint were not used when initializing T5ForConditionalGeneration. It seems the architecture of T5ForConditionalGeneration is not exactly the same as the original t5 in tensorflow; what are the differences? Is there an alternative in huggingface that is exactly the same as the seq2seq tensorflow t5?
(2) The output contains special tokens. I can solve this by setting "skip_special_tokens=True" during "decode". But is this normal? Is it a version problem?

Thank you!

>>> from transformers import AutoTokenizer, T5ForConditionalGeneration
>>> model_name = "allenai/unifiedqa-t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> model = T5ForConditionalGeneration.from_pretrained(model_name)

Some weights of the model checkpoint at allenai/unifiedqa-t5-small were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>>
>>> def run_model(input_string, **generator_args):
...     input_ids = tokenizer.encode(input_string, return_tensors="pt")
...     res = model.generate(input_ids, **generator_args)
...     return [tokenizer.decode(x) for x in res]
...
>>> run_model("which is best conductor? \\n (a) iron (b) feather")

['<pad> iron</s>']

What are your HF andtorch versions?

huggingface transformers 4.1.1
torch 1.7.1

The issue is quite odd. Could you try transformers 4.0 or 3.9?

Same issue with transformers 4.0.0. There does not seem to be a 3.9.x version...

It seems that the weights not being used during initialization is actually proper:
huggingface/transformers#8518
The weights are removed from huggingface t5 after 3.5.0. I assume the README example was initally ran with <=3.5.0?

Sorry for the delay!

I tried different HF versions:

  • For transformers==4.2.1, I am getting ['<pad> iron</s>'], which is not good.
  • However, transformers==3.5.1and transformers==3.1.0 are giving me ['iron']which is a more reasonable response.

So there must be recent changes on HF's side that are messing up the predictions. I will report the issue.

According to the discussion on HF, the new versions expected few arguments that were missing on the example.
I have updated the readme example accordingly.