SHI-Labs/Versatile-Diffusion

VersatileDiffusionPipeline - supported image variation inputs

LukasStruppek opened this issue · 2 comments

Hey,

I am currently working with the VersatileDiffusionPipeline implementation to perform image variation. While this works as promised on single PIL Image inputs, the pipeline does not seem to work with a list of PIL Images or with a PyTorch tensor input. However, the documentation (https://huggingface.co/docs/diffusers/api/pipelines/versatile_diffusion#diffusers.VersatileDiffusionPipeline.image_variation) states both input possibilities. For example, when running the following code:

image = Image.open( 'data/img.jpg').convert( "RGB")
image = [image, image]
out = pipe.image_variation(image=image,
                           num_inference_steps=5,
                           guidance_scale=5,
                           num_images_per_prompt=1,
                           negative_prompt=None,
                           output_type='np.array').images

The following error message is given:

Traceback (most recent call last):
File "/workspace/test_augmentation.py", line 31, in
out = pipe.image_variation(image=image,
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion.py", line 209, in image_variation
return VersatileDiffusionImageVariationPipeline(**components)(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion_image_variation.py", line 396, in call
self.check_inputs(image, height, width, callback_steps)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion_image_variation.py", line 265, in check_inputs
raise ValueError(f"image has to be of type PIL.Image.Image or torch.Tensor but is {type(image)}")
ValueError: image has to be of type PIL.Image.Image or torch.Tensor but is <class 'list'>

Also, PyTorch tensors lead to errors regarding the input shape. So when feeding in tensors with the standard PyTorch dimensional orientation, the following error is raised:
image = torch.randn((2, 3, 224, 224)) out = pipe.image_variation(image=image, num_inference_steps=5, guidance_scale=5, num_images_per_prompt=1, negative_prompt=None).images

Traceback (most recent call last):
File "/workspace/test_augmentation.py", line 31, in
out = pipe.image_variation(image=image,
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion.py", line 209, in image_variation
return VersatileDiffusionImageVariationPipeline(**components)(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion_image_variation.py", line 407, in call
image_embeddings = self._encode_prompt(
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion_image_variation.py", line 188, in _encode_prompt
image_input = self.image_feature_extractor(images=prompt, return_tensors="pt")
File "/opt/conda/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 437, in call
return self.preprocess(images, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 351, in preprocess
images = [
File "/opt/conda/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 352, in
self.resize(image=image, size=size, resample=resample)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 165, in resize
return resize(image,
File "/opt/conda/lib/python3.10/site-packages/transformers/image_transforms.py", line 263, in resize
image = to_channel_dimension_format(image, ChannelDimension.LAST)
File "/opt/conda/lib/python3.10/site-packages/transformers/image_transforms.py", line 75, in to_channel_dimension_format
image = image.transpose((1, 2, 0))
ValueError: axes don't match array

Ok, I think I found the bugs. Made a pull request in the diffusers repo: huggingface/diffusers#1568

Hi thanks for pointing the bug, will contact HF team.