Is CLIP image encoder really helpful?

Question

Is CLIP image encoder really helpful?

kaleidoscopical opened this issue 10 months ago · 5 comments

Nice work! I wonder whether including CLIP image encoder can really expedite the entire training process as claimed in the paper. Do you have any insight about it? Many thanks.

btw, I see clip-vit-base-patch32 is used instead of clip-vit-large-patch14 in the config file. Is there any consideration behind it?

kaleidoscopical commented 10 months ago

...

Answer 1 · 2024-01-15T13:29:01.000Z

I think CLIP embedding is probably just to maintain the CFG...

Answer 2 · 2024-01-15T13:30:32.000Z

The dimensions of sd and clip-vit-base-patch32 are aligned.

Answer 3 · 2024-01-15T13:32:41.000Z

Thanks! Really helpful advices!

Answer 4 · 2024-01-16T10:09:45.000Z

@guoqincode hi, I got normal result when removing the clip feature, but weird result when including the clip feature. any reason?