/uetasr

An Automatic Speech Recognition toolkit, written in TensorFlow 2

Primary LanguagePythonMIT LicenseMIT

UETASR

python tensorflow Hugging Face Spaces license

An Automatic Speech Recognition toolkit in TensorFlow 2

Suggestions are always welcome!


Key features

UETASR provides various useful tools to speed up and facilitate research on speech technologies:

  • A YAML-based hyperparameter specification language that describes all types of hyperparameters, from individual numbers to complete objects.

  • Single and Multi-GPUs training and inference with TensorFlow 2 Data-Parallel or Distributed Data-Parallel.

  • A transparent and entirely customizable data input and output pipeline, enabling users to customize the I/O pipelines.

  • Logging and visualization with WandB and TensorBoard.

  • Error analysis tools to help users debug their models.

Supported Models

Feature extraction and augmentation

UETASR provides efficient and GPU-friendly on-the-fly speech augmentation pipelines and acoustic feature extraction:

  • Augmentation:
    • Adaptive SpecAugment (paper)
    • SpliceOut (paper)
    • Gain, Time Stretch, Pitch Shift, etc. (paper)
  • Featurization:
    • MFCC, Fbank, Spectrogram, etc.
    • Subword tokenization (BPE, Unigram, etc.)

Installation

For training and testing, you can use git clone to install some optional packages from other authors (ctc_loss, rnnt_loss, etc.)

Prerequisites

  • TensorFlow >= 2.9.0

  • CuDNN >= 8.1.0

  • CUDA >= 11.2

  • Nvidia driver >= 470

Install with GitHub

Once you have created your Python environment (Python 3.6+) you can simply type:

git clone https://github.com/thanhtvt/uetasr.git
cd uetasr
pip install -e .

Then you can access uetasr with:

import uetasr

Install with Conda

git clone https://github.com/thanhtvt/uetasr.git

conda create --name uetasr python=3.8
conda activate uetasr
conda install cudnn=8.1.0

cd uetasr

pip install -e .

Install with Docker

Build docker from Dockerfile:

docker build -t uetasr:v1.0.0 .

Run container from uetasr image:

docker run -it --name uetasr --gpus all -v <workspace_dir>:/workspace uetasr:v1.0.0 bash

Getting Started

  1. Define config YAML file, see the config.yaml file this folder for reference.
  2. Download your corpus and create a script to generate the .tsv file (see this file for reference). Check our provided tools whether they meet your need.
  3. Create transcript.txt and cmvn.tsv files for your corpus. We implement this script to generate those files, knowing the .tsv file generated in step 2.
  4. For training, check train.py in the egs folder to see the options.
  5. For testing, check test.py in the egs folder to see the options.
  6. For evaluating and error analysis, check asr_evaluation.py in the tools folder to see the options.
  7. [Optional] To publish your model on 🤗, check this space for reference.

References & Credits

  1. namnv1906 (for the guidance & initial version of this toolkit)
  2. TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2
  3. ESPNet: End-to-End Speech Processing Toolkit
  4. SpeechBrain: A PyTorch-based Speech Toolkit
  5. Python module for evaluting ASR hypotheses
  6. Accumulated Gradients for TensorFlow 2