Single or Multiple speakers detection
Slide
https://docs.google.com/presentation/d/10Jpm2rLMkhuI4_mkF0qBUQUKfVoF-u6_oSzNxOy_LfY/edit?usp=sharing
Survey
Dataset
Original source:
- Vivos: https://huggingface.co/datasets/vivos
- Room Impulse Response: https://www.openslr.org/28/
- Musan Noise: https://www.openslr.org/17/
Mixture Dataset
Algorithm
Source
- Overlap: tuannguyenvananh/vivos-mixture-simulation-nonoverlap-dataset
- Non-Overlap: tuannguyenvananh/vivos-mixture-simulation-overlap-dataset
Code to create mixture dataset
- Overlap: tuannguyenvananh/mixture-simulation-overlap
- Non-Overlap: tuannguyenvananh/mixture-simulation-non-overlap
Image Recognition task
Code to train ConvNeXt
- Overlap: tuannguyenvananh/convnext-on-vivox-overlap-binary
- Non-Overlap: tuanio/convnext-on-vivox-nonoverlap-binary
Code to train Conformer
- Overlap: tuannguyenvananh/conformer-on-vivox-overlap-binary
- Non-Overlap: tuannguyenvananh/conformer-on-vivox-nonoverlap-binary
Speaker Diarization
Speech Activity Detection:
- pyannot/voice-activity-detection: huggingface.co/pyannote/voice-activity-detection
Speaker Embedding
- TDNN-based x-vectors: tuannguyenvananh/x-vector-vivos
- MFA-Conformer: tuannguyenvananh/mfaconformer-vivos
Clustering:
- TDNN-based x-vectors: tuannguyenvananh/x-vector-clustering
- MFA-Conformer: tuannguyenvananh/mfaconformer-clustering