audio inference

Question

audio inference

Opened this issue 3 years ago · 3 comments

@kkoutini
Thanks for sharing nice work. I want to know how to read an audio file and do full inference. Can you show me the example? How to do preprocess?

Answer 1 · 2022-02-10T13:03:12.000Z

Hi! for inference only we prepared this repo: https://github.com/kkoutini/passt_hear21
you can install it:

pip install -e 'git+https://github.com/kkoutini/passt_hear21@0.0.9#egg=hear21passt'

then use it for inference:

import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

model = load_model(mode="logits").cuda()
logits = model(wave_signal)

Answer 2 · 2022-02-11T02:28:41.000Z

In fact, I have tried passt_hear21 to do inference. But in the example, the input is not audio file. My question is if I have audio file, how can I use it as a correct input? In other words, how to get wave_singal above?

import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

wave_signal, sr = torchaudio.load("test_audio.wav")
model = load_model(mode="logits").cuda()
logits = model(wave_signal)

Is that right? Any other preprocess need I do?

Answer 3 · 2022-02-13T16:23:57.000Z

That's correct. you just need to make sure that the signal has 32k sampling rate.