CLIP: Input image size (490x490) doesn't match model (336x336).

Question

CLIP: Input image size (490x490) doesn't match model (336x336).

Closed this issue 7 months ago · 6 comments

Hi,
There is an error when I run quickstart.py, which as following:
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 122, in encode_img
img_embeds, atts_img, img_target = self.img2emb(image)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 126, in img2emb
img_embeds = self.vision_proj(self.vit(image.to(self.device)))
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/build_mlp.py", line 133, in forward
image_forward_outs = self.vision_tower(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1171, in forward
return self.vision_model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1094, in forward
hidden_states = self.embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 244, in forward
raise ValueError(
ValueError: Input image size (490490) doesn't match model (336336).

It seems like that the img_size is 490(from config.json) but the size of clip is 336(clip-vit-large-patch14-336)。Where did I operate incorrectly? Thank you so much!

Answer 1 · 2025-02-07T07:50:20.000Z

I read the issue #4 and notice the same error.

Both model and config are downloaded from HF, but this error still appears.

Answer 2 · 2025-02-07T09:33:01.000Z

Hi, @Lv996331209 . 490 is indeed correct resolution we used in ChartMoE. Probably, you can check the version of core python packages like transformers according to requirements.txt. If you can give me more details to reproduce your bug, that would be really better~. I've never encountered this problem. But #4 has been solved. You can also discuss it with the author.

Thanks for you attention! If any question exists (like the version of some python package?), I'm very willing to communicate with you～!

Answer 3 · 2025-02-07T09:48:52.000Z

@Lv996331209 . Hi, can you provide the versions of packages mentioned in requirements.txt. I will try to reproduce this bug~. Thanks!

Answer 4 · 2025-02-09T16:16:23.000Z

@Lv996331209 . Hi, you can try using the same version of transformers as requirements.txt. I think it will help you~

Answer 5 · 2025-02-10T05:41:19.000Z

@Coobiw Hi, thanks for your help! It was the wrong version of transformers package, due to my unintentional upgrade. After I reverted the version as requirements.txt, the error was resolved. That was good!

Thank you so much for your suggestion! BTW, looking forward to your dataset. :)

Answer 6 · 2025-02-10T08:02:58.000Z

Thanks for your kind words. I will add this problem into FAQs. Whole training pipeline codes and dataset are coming! I am organizing these contents.