imagebind_audio_retrieval.mov
This plugin allows you to search your dataset for images that are similar to a given audio clip.
How does it work?
- ImageBind embedding model embeds images and audio clips into a shared space (1024 dim)
- Qdrant similarity index stores the embeddings and allows for fast similarity search
- FiftyOne provides a UI for uploading the audio clip, pre-filtering, and searching the similarity index.
It demonstrates how to work with custom media types in FiftyOne, and how to create custom vector similarity indices.
Note: This plugin is a proof of concept and is not intended for production use.
It works with ogg
and wav
audio files, but not mp3
files, and makes an API
call to replicate rather than running the embedding model locally, to avoid
potential installation issues.
fiftyone plugins download https://github.com/jacobmarks/audio-retrieval-plugin
You will also need to install replicate
and qdrant-client
:
pip install replicate qdrant-client
- Opens the audio retrieval panel on click
- Creates an index for the dataset using the ImageBind embedding model. This
operation can take a little while to run, so it is recommended to run it in
delegated execution mode. To do so, check the
Delegated
box in the operator's modal, and then in a terminal run:
fiftyone delegated launch
- Searches the index for images that are similar to the given audio clip. This should be relatively fast, although it may take a minute for the replicate server to start up.
Before you can use the plugin, you will need to create an account on Replicate.com. Once you have created an account, you can create an API token, and then add this token as an environment variable:
export REPLICATE_API_TOKEN=<your token>
You will also need to start a Qdrant server locally. To do so, start up your Docker daemon, and then run:
docker run -p "6333:6333" -p "6334:6334" -d qdrant/qdrant
Then, you can run the create_imagebind_index
operator, and the
open_audio_retrieval_panel
operator. The latter will open a panel that allows
you to upload an audio clip, and then search for similar images.