TorchopenL3 is an open-source Python library Pytorch Support for computing deep audio embeddings.
Works without librosa (changed to emblibrosa file which is use only few functions which is need for torchopenl3). Excluded numba and resampy packages, which have problem with installation at embedded ARM platforms. Also it is not used resampy (used only julian for resample audio) and others lubraries which is need numba and others libraries not existed in embedded platforms.
#ATTENTION
works on raspberry very slow - about 17 seconds for 1 sound and mistakes at embeddings calculation - TODO need to fix
Please refer to the Openl3 Library for keras version.
The audio and image embedding models provided here are published as part of [1], and are based on the Look, Listen and Learn approach [2]. For details about the embedding models and how they were trained, please see:
Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.
pip install git+https://github.com/toborobot/torchopenl3.git
url = 'https://raw.githubusercontent.com/marl/openl3/master/tests/data/audio/chirp_44k.wav'
filename = 'Sample_audio.wav'
r = requests.get(url, allow_redirects=True)
open(filename, "wb").write(r.content)
emb, ts = torchopenl3.get_audio_embedding(audio, sr)
emb, ts = torchopenl3.get_audio_embedding(audio, sr, content_type="env",
input_repr="linear", embedding_size=512)
print(f"Embedding Shape {emb.shape}")
print(f"TimeStamps Shape {ts.shape}")
emb, ts = torchopenl3.get_audio_embedding(audio, sr, center=False)
print(f"Embedding Shape {emb.shape}")
print(f"TimeStamps Shape {ts.shape}")
model = torchopenl3.models.load_audio_embedding_model(input_repr="mel256", content_type="music",
embedding_size=512)
emb, ts = torchopenl3.get_audio_embedding(audio, sr, model=model)
print(f"Embedding Shape {emb.shape}")
print(f"TimeStamps Shape {ts.shape}")
Special Thank you to Joseph Turian for his help
[1] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.
[2] Look, Listen and Learn
Relja Arandjelović and Andrew Zisserman
IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017.