pytorch port of the openl3 audio embedding (from the marl implementation)
Clone this repo and run cd torchopenl3 && pip install -e .
OpenL3 comes in a couple of flavors. We can choose from:
- input representations:
mel128
ormel256
.linear
coming soon - content types:
music
orenv
. Themusic
model variant was trained on music, while theenv
was trained on environmental sounds. - embedding size: output embedding size. Either 512 or 6144.
Let's load a model! We choose the mel128
, music
, 512
variant.
import torchopenl3
import torch
import numpy as np
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu' # use GPU if we can!
# dummy audio
SAMPLE_RATE = 48000
audio = np.random.randn(1, SAMPLE_RATE).astype(np.float32) # 1 second of audio at 48kHz
model = torchopenl3.OpenL3Embedding(input_repr='mel128',
embedding_size=512,
content_type='music')
embedding = torchopenl3.embed(model=model,
audio=audio, # shape sould be (channels, samples)
sample_rate=SAMPLE_RATE, # sample rate of input file
hop_size=1,
device=DEVICE) # use gpu?
Tests are written using pytest
. Run pip install pytest
to install pytest.
run pytest
.