This repository provides a driver for LibriSpeech database.
- Download LibriSpeech datasets from
LibriSpeech
. - Extract dev/test/train archives to the folder LibriSpeech
- Rename folders to the template: {subset}-{protocol}. For example:
test-clean
,train-clean
,dev-clean
- Clone this repository.
- Set path to the:
- LibriSpeech corpus
db_dir
(it should consists SPEAKERS.txt file) annotation_dir
(pyannote-db-librispeech/LibriSpeech
) consists annotation files for the current corpusesprotocols
e.g. ['dev-clean', 'dev-other', ...]path_to_wav
path where training wav files will be stored
- Convert audio files and create annotation files by run script
LibriSpeech/generate.py
- Write string
LibriSpeech: /path/to/corpus/LibriSpeech/wav/{uri}.wav
to file~/.pyannote/db.yml