Convert the context vectors/embeddings to nearest words and use them instead

Question

Convert the context vectors/embeddings to nearest words and use them instead

100rab-S opened this issue a year ago · 1 comments

Hi @RenShuhuai-Andy , I have another question.

Can we convert the context vectors/embeddings to its nearest words and use them in the text encoding? Intuitively it makes sense, since the embeddings are learning some prompts (in case of POMP these are prefixes) which best suit the training dataset.

If this works then we could use these nearest words with other CLIP variant models. Then training the other CLIP variant isn't required.
Training related issue: #13 (comment)

Answer 1 · 2023-11-30T15:22:42.000Z

That's an interesting idea. I have actually tried this approach, but I found that the resulting words did not effectively convey meaningful information (also verified by CoOp, see table 4 in its paper).
In terms of adapting the converted prompts to different variants of CLIP, I'm uncertain about their compatibility. However, feel free to give it a try as it may yield some interesting insights :)