Speech recognition, also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text, is a capability which enables a program to process human speech into a written format. The Speako is a Natural Language Processing based project built on top of stack of technologies in order to transcribe the English voice based audio files of any accent. The project further provides the facility to user to provide the Urdu Translation of that transcribed text. Lastly, it also extracts Keywords from that transcribed text.
- ๐คฉ This project will let the users to transcribe there
.flac
or.wav
audio file. - ๐ฅณ Translate the transcribtion into Urdu Language.
- ๐ Extract the Key points of text.
โโโ LICENSE
โโโ Makefile <- Makefile with commands like `make data` or `make train`
โโโ README.md <- The top-level README for developers using this project.
โโโ data
โ โโโ external <- Data from third party sources.
โ โโโ interim <- Intermediate data that has been transformed.
โ โโโ processed <- The final, canonical data sets for modeling.
โ โโโ raw <- The original, immutable data dump.
โ
โโโ docs <- A default Sphinx project; see sphinx-doc.org for details
โ
โ
โโโ notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
โ the creator's initials, and a short `-` delimited description, e.g.
โ `1.0-jqp-initial-data-exploration`.
โ
โโโ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
โ generated with `pip freeze > requirements.txt`
โ
โโโ src <- Source code for use in this project.
โ โโโ __init__.py <- Makes src a Python module
โ โ
โ โโโ data <- Scripts to download or generate data
โ โ โโโ make_dataset.py
โ โ
โ โโโ features <- Scripts to turn raw data into features for modeling
โ โ โโโ build_features.py
โ โ
โ โโโ models <- Scripts to train models and then use trained models to make
โ โ โ predictions
โ โ โโโ predict_model.py
โ โ โโโ train_model.py
โ โ
โ โโโ visualization <- Scripts to create exploratory and results oriented visualizations
โ โโโ visualize.py
โ
โโโ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
We selected four accurate working Transcription Models and peform evaluation to select the best performer among four of them.
- Facebook/wav2vec2-large-960h-lv60-self
- Facebook/wav2vec2-lv60-base
- Pytorch transformers model
- Deep search model by Mozilla
- Audio clips of different
English
accents were collected several online resources.
- Sampled the Audio file at 16 KHz
- Removed any distortion or background noises from the audio
These are the following evaluation metrics which were considered to select the best working model.
- Word Error Rate for each model
- Match Error Rate for each model
- Word Information Loss for each model
- Word Error Rate for each accent
- Match Error Rate for each accent
- Word Information Loss for each accent
All the evaualtion results and meta-data
are logged in Neptune AI
- TIMIT: is a corpus of phonemically and lexically transcribed speech of American English speakers of different genders and dialects.
- Removed irrelivant features from English dataset.In our dataset:
phonetic_detail
,word_detail
,dialect_region
,sentence_type
,speaker_id
) - Removed expressions like \ , \ ? \ . \ ! \ - \ ; \ : \ "
- Sampled the Audio file at 16 KHz
- Use WER (Word Error Rate)
-
TRANSFORMER: We use Transformers from huggnig face.
-
Model Used: facebook/wav2vec2-large-960h-lv60-self
URDU
- Translates the transcription into Urdu Language.
- Model Used: Helsinki-NLP/opus-mt-en-ur
- In future we will be adding a Pipeline channel to preprocess and generate direct results from it
- Text analysis feature that automatically extracts the most used important words from a transciption. It helps summarize the content of texts and recognize the main topics discussed.
- Model Used: KeyBERT
- The UI of the project is built using Streamlit.
- It provides a responsive GUI presenation of the project with their respective model
results
.
- Inference Pipeline using ZenML
- Fine Tuning Pipeline for English ASR with Transformers
You can run Docker image on your local system using
`docker pull taserx/speako:latest`
`docker run -p 8501:8501 taserx/speako:latest`
`docker exec -it <container_name> bash`
`apt-get update && apt-get install libsndfile1`
For python file:
`python app.py'
- Python
- facebook/wav2vec2-large-960h-lv60-self
- Helsinki-NLP/opus-mt-en-ur
- KeyBERT
- Streamlit
- Docker
- Visual Studio Code
- Google Colaboratory
- Google Drive Mount
- Neptune AI
This project is licensed under the MIT License - see the LICENSE
file for details.