/samromur-asr

Automatic Speech Recognition (ASR) system for the Samrómur speech corpus using Kaldi

Primary LanguageShellApache License 2.0Apache-2.0

LVL Samrómur ASR

Cover Image

NOTE! This is a project in development.

Automatic Speech Recognition (ASR) system for the Samrómur speech corpus using Kaldi
Center for Analysis and Design of Intelligent Agents, Language and Voice Lab
Reykjavik University

This project is a research project on ASR creation. It does not contain trained ASR models or scripts on how to perform speech recognition using the models trained with the recipes provided here. The Althingi recipe provides example scripts for how to run a Kaldi trained speech recognizer.

We plan to have the recipes ready by October 2021 and create a Docker with the trained models.

Table of Contents

Click to expand

1. Introduction

Samrómur ASR is a collection of scripts, recipes, and tutorials for training an ASR using the Kaldi-ASR toolkit.

s5_base is the regular ASR recipe. It's meant to be the foundation of our Samrómur recipes. s5_subwords is a subword ASR recipe. s5_children is a standard ASR recipe adapted towards children speech.

documentation contains information on data preparation for Kaldi and setup scripts preprocessing contains external tools for preprocessing and data preprocessing examples

2. The Dataset

The Samrómur speech corpus is an open (CC-BY 4 licence) and accessible database of voices that everyone is free to use when developing software in Icelandic. The database consists of sentences and audio clips from the reading of those sentences as well as metadata about the speakers. Each entry in the database contains a WAVE audio clip and the corresponding text file.

The Samrómur speech corpus is available for download at OpenSLR.
The Samrómur speech corpus will be available for download soon on CLARIN-IS and LDC.

For more information about the dataset visit https://samromur.is/gagnasafn.

3. Setup

You can use these guides for reference even if you do not use Terra (a cloud cluster at LVL).

4. Computing Requirements

This project is developed on a computing cluster with 112 CPUs and 10 GPUs (2 GeForce GTX Titan X, 4 GeForce GTX 1080 Ti, 4 GeForce RTX 2080 Ti). All of that is definitely not needed but the neural network acoustic model training scripts are intended to be used with GPUs. No GPUs are needed to use the trained models.

To do: Add training time info. My guess is around 24 hours for run.sh in s5_children on 135 hours of data.

5. License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

6. References

This project was funded by the Language Technology Programme for Icelandic 2019-2023. The programme, which is managed and coordinated by Almannarómur, is funded by the Icelandic Ministry of Education, Science and Culture.

7. Contributing

Pull requests are welcome. For significant changes, please open an issue first to discuss what you would like to change. For more information, please take a look at LVL Software Development Guidelines.

8. Contributors

Become a contributor

🌟 PLEASE STAR THIS REPO IF YOU FOUND SOMETHING INTERESTING 🌟