1215thebqtic/pyannote-db-librispeech

LibriSpeech plugin for pyannote.database

PythonNOASSERTION

pyannote.database plugin for LibriSpeech corpus

This repository provides a driver for LibriSpeech database.

Download LibriSpeech datasets from LibriSpeech.
Extract dev/test/train archives to the folder LibriSpeech
Rename folders to the template: {subset}-{protocol}. For example: test-clean, train-clean, dev-clean
Clone this repository.
Set path to the:

LibriSpeech corpus db_dir (it should consists SPEAKERS.txt file)
annotation_dir (pyannote-db-librispeech/LibriSpeech) consists annotation files for the current corpuses
protocols e.g. ['dev-clean', 'dev-other', ...]
path_to_wav path where training wav files will be stored

Convert audio files and create annotation files by run script LibriSpeech/generate.py
Write string LibriSpeech: /path/to/corpus/LibriSpeech/wav/{uri}.wav to file ~/.pyannote/db.yml