Language Identification Using Deep Convolutional Recurrent Neural Network
Read the complete master's thesis here.
- /data
- Scripts to download training data from Voxforge, European Parliament Speech Repository and YouTube. For usage details see below.
- /keras
- All the code for setting up and training various models with Keras/Tensorflow.
- Includes training and prediction script. See
train.py
andpredict.py
. - Configure your learning parameters in
config.yaml
. - More below
- /tools
- Some handy scripts to clean filenames, normalize audio files and other stuff.
- /webserver
- A demo project for language identification. A small web server with a REST interface for classification and a small web frontend to upload audio files.
- /notebooks
- Various jupyter notebooks with various audio experiments
- /thesis
- Latex sources & figures for my thesis
- /paper
- Some papers / related worked I used in preparation for this thesis. Not complete.
- Keras 1
- TensorFlow
- Python 3.4
- youtube_dl
- sox
Downloads training data / audio samples from various sources.
- Downloads the audio samples from www.voxforge.org for some languages
/data/voxforge/download-data.sh
/data/voxforge/extract_tgz.sh {path_to_german.tgz} german
- Downloads various news channels from YouTube.
- Configure channels/sources in
/data/sources.yml
python /data/download_youtube.py
- Downloads various speeches and press conferences from European Speech Repository.
- needs WebDriver/Selenium & Firefox
python /data/download_europe_speech_repository.py
Make sure you have SoX installed. To create 500x129x1 grayscale spectrogram images run the following script.
python /data/wav_to_spectrogram.py --source <path> --target <path>
The above script uses different spectrogram generators to augment the data with additional noise or background music if needed. Adjust the imports accordingly.
I trained models for 4 languages (English, German, French, Spanish) and 6 languages (English, German, French, Spanish, Chinese, Russian). They might be released later.
To start a training run, set all the desired properties and hyperparameters i the config.yaml file and train with Keras:
python /keras/train.py --config <config.yaml>
To predict a single audio file run:
python /keras/train.py --model <path/to/model> --input <path/to/speech.mp3>
Audio files can be in any format understood by SoX. The pretrained model files need to be caomptible with Keras v1.
0 English,
1 German,
2 French,
3 Spanish,
4 Mandarin Chinese,
5 Russian
TBD