ASR pytorch project

Primary LanguagePythonMIT LicenseMIT

ASR project

The project is made for educational purposes, as the homework of the course deep learning for audio processing.

Installation guide

It is recommended to use python 3.8 or 3.9

You need to clone the repository and install the libraries:

git clone https://github.com/maximkm/DLA_ASR_HW.git
pip install -r requirements.txt

Description of the work done

  • Write all the basic functions and classes
  • Pass unit tests
  • Write and test a CTC transformer encoder from RNN-T
  • Write augmentations
  • Conduct the first experiments and select the optimal hyperparameters and augmentations
  • Train BPE on training texts and implement it into model training
  • Train the final model with the best parameters
  • Train a LM on training texts
  • Implement LM in beam search
  • Choose optimal beam search hyperparameters using optuna
  • Write an implementation of the Common Voice dataset and write a config for the finetune model
  • Finetune model on Common Voice

Wandb report

The final score received

Dataset Type predict CER WER
LibriSpeech: test-clean beam search 0.06742 0.12988
LibriSpeech: test-other beam search 0.17529 0.27248
LibriSpeech: test-clean argmax 0.07794 0.21284
LibriSpeech: test-other argmax 0.17529 0.38656

Independent code testing

You need to download:

  1. The final checkpoint of the model and put the save folder in the main directory
  2. LM and place the file in the hw_asr/lm directory

You can run this script:

gdown https://drive.google.com/uc?id=10Ubmu6-w415A2jiUXobJL4ZzMy7A5fxW
unzip saved.zip
gdown https://drive.google.com/uc?id=1WGFJgzrh850BSXkaCb-dzsWqK894Dmd0
mv 5_full_gram.arpa hw_asr/lm

Now you can run the code:

  1. You need to run the model with the following command:
python test.py -c hw_asr/configs/test_ctc_big_clean.json -r saved/models/baseline/1013_154403/model_best.pth -o test-clean.json

This command loads the prepared test_ctc_big_clean.json config inside of which contains the description of the model and dataset.

After processing all the data will save the predictions in test-clean.json.

Similarly, the test_ctc_big_other.json config was created. Also at test.py there is a -t argument to specify a folder with a dataset.

  1. The last step is to run a script to calculate the WER and CER metrics
python calc_wer_cer.py -t test-clean.json


This repository is based on a heavily modified fork of pytorch-template repository.

The CTC transformer architecture is based on Transformers with convolutional context for ASR.