Is CLIP image encoder really helpful?
kaleidoscopical opened this issue · 5 comments
kaleidoscopical commented
Nice work! I wonder whether including CLIP image encoder can really expedite the entire training process as claimed in the paper. Do you have any insight about it? Many thanks.
btw, I see clip-vit-base-patch32
is used instead of clip-vit-large-patch14
in the config file. Is there any consideration behind it?
guoqincode commented
I think CLIP embedding is probably just to maintain the CFG...
guoqincode commented
The dimensions of sd and clip-vit-base-patch32 are aligned.
kaleidoscopical commented
Thanks! Really helpful advices!
kaleidoscopical commented
...
garychan22 commented
@guoqincode hi, I got normal result when removing the clip feature, but weird result when including the clip feature. any reason?