guoqincode/Open-AnimateAnyone

Is CLIP image encoder really helpful?

kaleidoscopical opened this issue · 5 comments

Nice work! I wonder whether including CLIP image encoder can really expedite the entire training process as claimed in the paper. Do you have any insight about it? Many thanks.

btw, I see clip-vit-base-patch32 is used instead of clip-vit-large-patch14 in the config file. Is there any consideration behind it?

I think CLIP embedding is probably just to maintain the CFG...

The dimensions of sd and clip-vit-base-patch32 are aligned.

Thanks! Really helpful advices!

@guoqincode hi, I got normal result when removing the clip feature, but weird result when including the clip feature. any reason?