manojpamk/pytorch_xvectors

Provide example for inference in Python

Closed this issue · 1 comments

Hi and many thanks for this nice work. I'm trying to integrate this code into my project in Python to obtain embeddings from a given WAV file. From the source files I can easily get how you apply the network and get the embeddings. However, the nnet3 egs format that it's being read needs to be computed by kaldi... is there an option to preprocess the file with a pure python library? Could you document the exact shape of the MFCCs that the models expects? That way I may implement the feature extraction with librosa or another similar tool

Thank you in advance

Hi,

Unfortunately I haven't been able to work on a purely Pythonic audio -> egs pipeline.
Do you need to train the network? If you only need the embeddings, nnet3 egs format is not required. Check out egs/diarize.sh for an example.

Manoj