kkoutini/PaSST

audio inference

Opened this issue · 3 comments

@kkoutini
Thanks for sharing nice work. I want to know how to read an audio file and do full inference. Can you show me the example? How to do preprocess?

Hi! for inference only we prepared this repo: https://github.com/kkoutini/passt_hear21
you can install it:

pip install -e 'git+https://github.com/kkoutini/passt_hear21@0.0.9#egg=hear21passt' 

then use it for inference:

import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

model = load_model(mode="logits").cuda()
logits = model(wave_signal)

In fact, I have tried passt_hear21 to do inference. But in the example, the input is not audio file. My question is if I have audio file, how can I use it as a correct input? In other words, how to get wave_singal above?

import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

wave_signal, sr = torchaudio.load("test_audio.wav")
model = load_model(mode="logits").cuda()
logits = model(wave_signal)

Is that right? Any other preprocess need I do?

That's correct. you just need to make sure that the signal has 32k sampling rate.