Regression in CLIPProcessor from 4.24.0 -> 4.25.0.dev0
patrickvonplaten opened this issue · 0 comments
System Info
transformers
version: 4.24.0 / 4.25.0.dev0- Platform: Linux-5.18.10-76051810-generic-x86_64-with-glibc2.34
- Python version: 3.9.7
- Huggingface_hub version: 0.11.0.dev0
- PyTorch version (GPU?): 1.11.0+cpu (False)
- Tensorflow version (GPU?): 2.9.1 (False)
- Flax version (CPU?/GPU?/TPU?): 0.6.0 (cpu)
- Jax version: 0.3.16
- JaxLib version: 0.3.15
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
There seems to be a regression of CLIPProcessor
between current main
and 4.24
You can easily reproduce it by running the following script with current main 4.25.0.dev0
and 4.24
to see a difference:
#!/usr/bin/env python3
from transformers import CLIPProcessor
import transformers
from PIL import Image
import PIL.Image
import numpy as np
import torchvision.transforms as tvtrans
import requests
from io import BytesIO
print(transformers.__version__)
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")
BICUBIC = PIL.Image.Resampling.BICUBIC
image = image.resize([512, 512], resample=BICUBIC)
image = tvtrans.ToTensor()(image)
np_image = np.asarray(image)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
pixel_values = processor(images=2 * [np_image], return_tensors="pt").pixel_values
print(pixel_values.abs().sum())
print(pixel_values.abs().mean())
The outputs for the different versions are as follows:
4.24.0
tensor(287002.5000)
tensor(0.9533)
4.25.0.dev0
tensor(503418.8125)
tensor(1.6722)
The code snippet above comes from reproducing a problem that happens when updating transformers
to main for https://github.com/SHI-Labs/Versatile-Diffusion .
https://github.com/SHI-Labs/Versatile-Diffusion only works with transformers==4.24.0
- the pipeline gives random results when using transformers==4.25.0.dev0
Expected behavior
It seems like a bug was introduced for after the 4.24. release. The code snippet above might seem a bit edge-casy but I believe people have started to build any kind of image processing pipelines with CLIP already.