Exported onnx flan-t5-small behavior is incorrect in tensorflow.js but ok in python

Question

Exported onnx flan-t5-small behavior is incorrect in tensorflow.js but ok in python

Closed this issue 2 months ago · 6 comments

System Info

Export model using transformers v3 branch
commit ac391d2 , Mon Sep 23

Libs:
onnx==1.16.2
onnx-graphsurgeon==0.3.27
onnxconverter-common==1.14.0
onnxruntime==1.19.2
onnxslim==0.1.31
transformers==4.43.4

Inference is running v3 and onnxruntime-web@1.20.0-dev.20240827-1d059b8702 (Firefox ML inference engine in development )

Environment/Platform

Description

Possible issue with convert.py script or inference engine with t5 models.

When I run an onnx flan-t5-small model in python it works fine, but in tensorflow.js it misbehaves, typically after the first token, and repeats the same word over and over.

The existing Xenova model at https://huggingface.co/
Xenova/flan-t5-small works fine in the same transformers.js inference engine

Reproduction

Convert the google/flan-t5-small model to onnx using the transformers.js conversion script. (no quantization)

python -m convert --model_id flan-t5-small --task text2text-generation

Rename decoder_model.onnx to decoder_model_merged.onnx

My exported model is uploaded here :
https://huggingface.co/rolf-mozilla/test-flan-t5-small

{
  "inputArgs": [
    "Generate a topic from these web titles: The Future of Science and Tech. Top 10 Gadgets of the Month. Wearable Tech Trends to Watch. Guide to Building Your Own PC"
  ],
  "runOptions": {
    "max_new_tokens": 22
  }

Output in tensorflow.js for text2text-generation

Results: [
  {
    "generated_text": " Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science"
  }

Answer 1 · 2024-10-10T19:23:11.000Z

Hi there 👋 I believe the issue is here:

Rename decoder_model.onnx to decoder_model_merged.onnx

If you look at the graph, I don't think you'll see it has inputs for past key values. For this reason, it will keep generating the first token repeatedly.

Can you share the original pytorch model for me to check with the conversion? Otherwise, you can try using the v3 conversion script: https://github.com/xenova/transformers.js/tree/v3/scripts

Answer 2 · 2024-10-10T19:50:16.000Z

For the original model, I just downloaded https://huggingface.co/google/flan-t5-small and used the convert script on v3 branch. No modifications were made to the model or the convert script.

I'm a little confused because I'm pretty sure that when I run the converted model onnx python it works fine, but not in
tensorflow.js.

To run in python I move all the files in the same folder and use the code:

from transformers import AutoTokenizer, pipeline

from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("./models/flan-t5-small2")
model = ORTModelForSeq2SeqLM.from_pretrained("./models/flan-t5-small2", decoder_file_name="decoder_model.onnx", encoder_file_name="encoder_model.onnx", use_cache=False)

onnx_translation = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
text = "Generate a topic from these web titles: The Future of Science and Tech. Top 10 Gadgets of the Month. Wearable Tech Trends to Watch. Guide to Building Your Own PC"

pred = onnx_translation(text)

Answer 3 · 2024-10-10T20:04:22.000Z

Oh I see... you set the task to be text2text-generation, which outputs the model without past key values as input. The correct task/command is:

python -m convert --model_id flan-t5-small --task text2text-generation-with-past

or simply,

python -m convert --model_id flan-t5-small

since the task is inferred.

Answer 4 · 2024-10-11T15:05:13.000Z

That worked, thanks!

Answer 5 · 2024-10-11T15:05:54.000Z

I should note that 'python -m convert --model_id flan-t5-small' did not infer the task though. Maybe that is an unrelated bug.

Answer 6 · 2024-10-11T15:30:42.000Z

Great! You may need to use the full model id: google/flan-t5-small. If running locally, however, you do indeed need to specify the task :)