/WSJ2WAV

Convert WSJ sphere format to waveform and do data simulation.

Primary LanguagePythonMIT LicenseMIT

WSJ Data Preparation

This repository aims at providing some useful scritps to do data preparation for WSJ data.

Install Necessary Tools

cd tools
make

How to Use

WSJ0

# convert sphere to waveform
bash wsj0/1_sph2wav.sh   # remember to change wsj0_dir and save_dir

# add noise
python wsj0/2_prep_noisy_data.py -h

Public Dataset

There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.

Noise Datasets

You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use:

  1. Nonspeech100
  2. MUSAN
  3. freesound
  4. DEMAND

Room Impulse Response (RIR)

  1. OpenSLR
  2. AcouSP

Noisy Speech Datasets

  1. SUPERSEDED