Generating the LibriLight-Mix dataset

This script supports generating noisy and reverberant 2-speaker mixture audio for training with the Libri-Light dataset, which can be served as training materials for large-scale robust speech separation.

If you want to mix more speakers, please refer to LibriLightMix-WHAM.

Python requirements

Requires python 3.8, and the numpy, scipy, pandas, pyroomacoustics, and pysoundfile packages

$ pip install -r requirements.txt

If you cannot install the pyroomacoustics successfully, you can try:

$ pip install pyroomacoustics
$ pip install -r requirements.txt

Prerequisites

This requires the Libri-Light dataset, and the WHAM noise corpus.

Creating LibriLight-Mix

Creating meta files

$ python create_filenames.py

Change the following arguments in the script:

wham_path: Folder where the unzipped wham_noise was downloaded (training set).
librilight_path: Folder where the unzipped Libri-Light data was downloaded.
debug: Whether to process a dummy dataset.
SOT: Whether to process speakers in order (speaker1 speaks earlier than speaker2) for serialized output training.

Creating reverberation meta files

$ python run_sample_reverb.py

Creating mixture files

$ python create_wham_from_scratch.py --mono \
    --output-dir ./librilight_whamr/ \
    --mode fix \
    --sr 16000 \
    --fixed-len 5

The arguments for the script are:

output-dir: Where to write the new dataset.
mode: Length of the simulated speech: "fix" for a fixed length, "min" for the minimum length of the two utterences, and "max" for the maximum length of the two utterences.
sr: Sampling rate.
fixed-len: Fixed length in mode "fix".

Creating LibriLight-Mix parallelly with mulitple CPUs

Creating meta files

$ python create_filenames_parallel.py

Change the following arguments in the script:

wham_path: Folder where the unzipped wham_noise was downloaded (training set).
librilight_path: Folder where the unzipped Libri-Light data was downloaded.
savename: Name of the meta .csv file to save.
tag: Name of the meta .csv folder to save.
debug: Whether to process a dummy dataset.
SOT: Whether to process speakers in order (speaker1 speaks earlier than speaker2) for serialized output training.

Creating reverberation meta files

$ python run_sample_reverb_parallel.py

Change the filelists according to the tag.

Creating mixture files

for i in $(seq 0 49)
do
    python create_wham_from_scratch_parallel.py --mono \
        --output-dir "./LibrilightMix-medium/$i/" \
        --filepath "data/medium/mix_2_spk_filenames_librilight_tr_medium$i.csv" \
        --mode fix \
        --sr 16000 \
        --fixed-len 5
done

The arguments for the script are:

output-dir: Where to write the new dataset.
filepath: Name of the saved meta .csv folder.
mode: Length of the simulated speech: "fix" for a fixed length, "min" for the minimum length of the two utterences, and "max" for the maximum length of the two utterences.
sr: Sampling rate.
fixed-len: Fixed length in mode "fix".

Output data organization

For each utterance in the training (tr) set folder, the following wav files are written:

noise: contains the isolated background noise from WHAM!
s1_anechoic: isolated data from speaker 1 without reverb, but with appropriate delays to align with s1_reverb
s2_anechoic: isolated data from speaker 2 without reverb, but with appropriate delays to align with s2_reverb
s1_reverb: isolated data from speaker 1 with reverberation
s2_reverb: isolated data from speaker 2 with reverberation
mix_single_anechoic: for speech enhancement, contains mixture of s1_anechoic and noise
mix_clean_anechoic: clean speech separation for two speakers, contains mixture of s1_anechoic and s2_anechoic. The relative levels between speakers should match the original libri-light dataset, but the overall level of the mix will be different.
mix_both_anechoic: contains mixtures of s1_anechoic, s2_anechoic, and noise
mix_single_reverb: for speech enhancement, contains mixture of s1_reverb and noise
mix_clean_reverb: clean speech separation for two reverberant speakers, contains a mixture of s1_reverb and s2_reverb. The relative levels between speakers should match the original libri-light dataset, but the overall level of the mix will be different.
mix_both_reverb: contains mixtures of s1_reverb, s2_reverb, and noise

Reference

https://wham.whisper.ai/WHAMR_README.html

WangHelin1997/LibriLightMix-WHAMR

Generating the LibriLight-Mix dataset

Python requirements

Prerequisites

Creating LibriLight-Mix

Creating meta files

Creating reverberation meta files

Creating mixture files

Creating LibriLight-Mix parallelly with mulitple CPUs

Creating meta files

Creating reverberation meta files

Creating mixture files

Output data organization

Reference