ipynb file will be present in research_files folder The model requires GPU to generate output, please install CUDA and pytorch version compatible with your system
Speaker diarization, also known as diarization, involves segregating an audio stream with human speech into consistent segments based on the individual identity of each speaker.
- Convert Audio to Text using Whisper
- Segregate the text by clustering the embeddings using AgglomerativeClustering
- Perform NER to recognize names of participants
The model expects to inputs:
- Audio for speaker diarization
- Number of speakers in the audio
- Model will generate a complete transcript of the audio
- Dictionary of diarization with participant's name