SincNet and LSTM based Voice Type Classifier

In this repository, you'll find all the necessary code for applying a pre-trained model that, given an audio recording, classifies each frame into [SPEECH, KCHI, CHI, MAL, FEM].

FEM stands for female speech
MAL stands for male speech
KCHI stands for key-child speech
CHI stands for other child speech
SPEECH stands for speech :)

Our model has been developped in JSALT [1] and its architecture is based on SincNet [2]. The code mainly relies on pyannote-audio [3], an awesome python toolkit for building neural building blocks that can be combined to solve the speaker diarization task.

How to use ?

Disclaimer /!\
Installation
Applying
Evaluation
Going further

References

[1] Paola Garcia et al., "Speaker detection in the wild: Lessons learned from JSALT 2019" arXiV

[2] Mirco Ravanelli, Yoshua Bengio, “Speaker Recognition from raw waveform with SincNet” arXiV

[3] Hervé Bredin et al., "pyannote.audio: neural building blocks for speaker diarization" arXiV

orasanen/voice-type-classifier

SincNet and LSTM based Voice Type Classifier

How to use ?

References