NetEase-FuXi/EETQ

EETQ-quantized TrOCR gives nonsense output

donjuanpond opened this issue · 2 comments

Hello! I'm using EETQ through HuggingFace Transformers to quantize my TrOCR (vision encoder decoder) model. It is meant to generate text output from an image input, transcribing whatever text is shown in the image. I tried to quantize the model through EETQ to speed up inference using the following code:

from transformers import EetqConfig
# ... some other code here ...
eetq_config = EetqConfig("int8")
recognizer = VisionEncoderDecoderModel.from_pretrained(recognizer_path, quantization_config=eetq_config).to('cuda')
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten", quantization_config=double_quant_config)

When I run this quantized model, I get very weird results. The model starts labeling all the text in images as just the word "to". For example, what might have supposed to be labeled "3042846 JG-002" would end up being labeled "to to to to to to to to to" etc. What is causing this problem, and how can I fix it??

I can quantize the model via eetq_quantize but cannot make it via transformers and my result seems to be correct

from PIL import Image
import requests
from eetq import eet_quantize
import torch

# load image from the IAM database

image = Image.open("/path/to/").convert("RGB")
config = EetqConfig("int8")
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-handwritten').to(torch.float16).cuda()

eet_quantize(model, exclude=['output_projection'])
pixel_values = processor(images=image, return_tensors="pt").pixel_values.cuda()

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)```

Ok, it looks like the exclude=['output_projection'] part of your code, as well as quantizing with eetq package instead of at load time with the config is working. Thank you for your help!