Team Members: Utkarsh, Tanmay Yadav, Gaurav Kumar
https://github.com/utkarsh530/FlipkartGRiDWoZ
curl --form "file=@filename.wav" http://13.233.87.14:7000/denoise > filename_result.wav
Please note that .wav files are only supported!
Round 3 Problem and Testing dataset
- MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation
- Unet Architecture
- Mel spectrogram using Librosa
- Cocktail Party Source Separation using Deep Learning
- Investigating Deep Neural Transformations for Spectrogram-based Musical Source Separation
https://drive.google.com/file/d/1j9dN8tkDtNpCE8StjuWHcyeeZ5Yo8KRB/view?usp=sharing
We have used python library called asrtoolkit.
pip install asrtoolkit
Tested on 745 samples as provided by Flipkart and transcript generated from the given API:
We are removing some extraneous some outputs for better representation of data since the checking is primarily done on Hindi UTF-8 which significantly affects WER since Hindi Language has more syllables. Some of the results of the ASR API weren't as expected and WER is expected to increase if we use UTF-EN for WER calculation
Samples | Mean WER | Median WER | Mode WER |
---|---|---|---|
635 (85%) | 0.324 | 0.285 | 0.0 (101) |
698 (89%) | 0.385 | 0.3125 | 0.0 (101) |
715 (96%) | 0.402 | 0.333 | 0.0 (101) |
745 (100%) | 0.539 | 0.333 | 0.0 (101) |
Samples | Mean CER | Median CER | Mode CER |
---|---|---|---|
654 (87%) | 0.2351 | 0.167 | 0.0 (126) |
698 (92.6%) | 0.2750 | 0.17647 | 0.0 (126) |
715 (95%) | 0.3002 | 0.185 | 0.0 (126) |
745 (100%) | 0.5214 | 0.2 | 0.0 (126) |
pip install -r requirements.txt
python Test.py filename
The model is not uploaded on github, please download it from this link.
The file should be in the same directory and the output would be generated in <your_current_directory>/output/result.wav