This project will make it very easy to create speaker verification datasets for all languages. Audios will be automatically downloaded with 'youtube-dl'. Speakers in the audio will be pre-labeled automatically with GE2E encoder. Labeling can be done very efficently with keyboard shortcuts. For web interface, benefitted from this project. For labelling interface, benefitted from this project.
Shortcut | Description |
---|---|
CTRL + Space | Play/Pause current audio. |
Right Arrow | Load next audio. |
Left Arrow | Load previous audio. |
CTRL + Right Arrow | Forward audio |
CTRL + Left Arrow | Backward audio. |
CTRL + Up Arrow | Set speed to 2x. |
CTRL + Down Arrow | Set speed to 1x. |
a | Add new speaker. |
1, 2, 3, 4, .. , 9 | Label speaker as according to input number. |
Delete | Delete this audio. |
Need to install and configure apache kafka and mongoDB. To install apache kafka, you can follow this blog post. To install mongoDB server, you can follow offical documentation.
Need to get a valid GCP API developer key. Default values for kafka port and mongoDb address are below. Change them if you need.
{
"kafkaPort": 9092,
"mongoDbAddress" : "127.0.0.1:27017",
"googleAPIDeveloperKey" : "your_developer_key_here"
}
sudo apt install ffmpeg
pip install pipenv
pipenv --python 3.6
pipenv shell
pip install -r requirements.txt
cd a2lsv_web
python manage.py makemigrations web_interface
python manage.py migrate
python manage.py loaddata fixtures.json
python manage.py runserver
Open new terminal window and activate environment for every script.
python youtubeSearch.py
python youtubeAudioDownloader.py
python speakerDiarization.py
You can find final dataset files in “a2lsv_web/static/datasets/(dataset_name)/final_dataset” directory. Folder hierarchy is like speaker id => youtube video id => audio file.
You can download Installation Guide, Software Design Document and User Guide.