Simple Embeddings
Closed this issue · 4 comments
Hi,
Please could you provide a simple way to load a model and test a single audio clip to produce an embedding?
Thank you very much.
@corranmac Got it!
I have updated the readme.
See example.py
Thanks for such a quick response!
I was wondering if I'm able to transform these outputed embeddings to the same shape as clip, for use in speech-image retrieval and also image generation trained on clip embeddings? I can't seem to find a seperate class for encoding, eg. model.encode() like clip has.
Thanks
@corranmac
Yeah, you can use the semantic embedding of speech to calculate similarity with image embeddings for speech-image retrieval. In fact, this is how we do it in our paper.
I have added a function in the model class (kwClip.py) for extracting the semantic embedding for speech input
SpeechCLIP/avssl/model/kwClip.py
Lines 1299 to 1315 in e2a572d
@corranmac If there is no further question, I will close this issue.