/FlipkartGRiDWoZ

Noise Detection and Cancellation - Solving for Voice Interactions in Indian Houses & Neighborhoods

Primary LanguageJupyter Notebook

FlipkartGRiD 2.0 - Submission for Wheels of Zeus, IIT Kanpur

Team Members: Utkarsh, Tanmay Yadav, Gaurav Kumar

https://github.com/utkarsh530/FlipkartGRiDWoZ

Live Web App Url for testing

http://13.233.87.14/

API for testing the file

curl   --form "file=@filename.wav"   http://13.233.87.14:7000/denoise > filename_result.wav

Please note that .wav files are only supported!

1. Problem statement and Testing Dataset

Round 3 Problem and Testing dataset

2. Dataset Links

2.1 Background Noise

2.2 Human Voice

3. Research Papers and References

Video Explanation Link

https://drive.google.com/file/d/1j9dN8tkDtNpCE8StjuWHcyeeZ5Yo8KRB/view?usp=sharing

Sample Word Error Rate & Character Error Rate

We have used python library called asrtoolkit.

pip install asrtoolkit

Tested on 745 samples as provided by Flipkart and transcript generated from the given API:

We are removing some extraneous some outputs for better representation of data since the checking is primarily done on Hindi UTF-8 which significantly affects WER since Hindi Language has more syllables. Some of the results of the ASR API weren't as expected and WER is expected to increase if we use UTF-EN for WER calculation

Samples Mean WER Median WER Mode WER
635 (85%) 0.324 0.285 0.0 (101)
698 (89%) 0.385 0.3125 0.0 (101)
715 (96%) 0.402 0.333 0.0 (101)
745 (100%) 0.539 0.333 0.0 (101)
Samples Mean CER Median CER Mode CER
654 (87%) 0.2351 0.167 0.0 (126)
698 (92.6%) 0.2750 0.17647 0.0 (126)
715 (95%) 0.3002 0.185 0.0 (126)
745 (100%) 0.5214 0.2 0.0 (126)

Running the Script

pip install -r requirements.txt
python Test.py filename

The model is not uploaded on github, please download it from this link. The file should be in the same directory and the output would be generated in <your_current_directory>/output/result.wav

Pix2Pix GAN Diagram

gan

Discriminator Model

disc_model

Generator Model

gen_model