FlipkartGRiD 2.0 - Submission for Wheels of Zeus, IIT Kanpur

Team Members: Utkarsh, Tanmay Yadav, Gaurav Kumar

https://github.com/utkarsh530/FlipkartGRiDWoZ

Live Web App Url for testing

http://13.233.87.14/

API for testing the file

curl   --form "file=@filename.wav"   http://13.233.87.14:7000/denoise > filename_result.wav

Please note that .wav files are only supported!

1. Problem statement and Testing Dataset

Round 3 Problem and Testing dataset

2. Dataset Links

2.1 Background Noise

2.2 Human Voice

3. Research Papers and References

Video Explanation Link

https://drive.google.com/file/d/1j9dN8tkDtNpCE8StjuWHcyeeZ5Yo8KRB/view?usp=sharing

Sample Word Error Rate & Character Error Rate

We have used python library called asrtoolkit.

pip install asrtoolkit

Tested on 745 samples as provided by Flipkart and transcript generated from the given API:

We are removing some extraneous some outputs for better representation of data since the checking is primarily done on Hindi UTF-8 which significantly affects WER since Hindi Language has more syllables. Some of the results of the ASR API weren't as expected and WER is expected to increase if we use UTF-EN for WER calculation

Samples	Mean WER	Median WER
635 (85%)	0.324	0.285
698 (89%)	0.385	0.3125
715 (96%)	0.402	0.333
745 (100%)	0.539	0.333

Samples	Mean CER	Median CER
654 (87%)	0.2351	0.167
698 (92.6%)	0.2750	0.17647
715 (95%)	0.3002	0.185
745 (100%)	0.5214	0.2

Running the Script

pip install -r requirements.txt
python Test.py filename

The model is not uploaded on github, please download it from this link. The file should be in the same directory and the output would be generated in <your_current_directory>/output/result.wav