wl-zhao/VPD

Class embedding

SnowdenLee opened this issue · 1 comments

Hi,
thanks for the interesting work! Could you please also share the code for creating the class embedding, so that I could try on my own dataset?
More specifically, the text embedding for one prompt has a shape of [77, 768]. However, in the class_embedding.pt, each class only has an embedding with shape of [1,768]. Did you average over 77 tokens or?
Thanks a lot!

Hi,

Please refer to the FrozenCLIPEmbedder, where we use pool=True to obtain the embedding for each class.

Feel free to ask if you have any questions!