Quality suffers on earnings22 dataset

Question

Quality suffers on earnings22 dataset

soupslurpr opened this issue 6 months ago · 2 comments

whisper-tiny.en gets 18 WER without dynamic audio context on https://huggingface.co/datasets/distil-whisper/earnings22 (chunked, test) using evaluation.ipynb while acft-whisper-tiny.en with dynamic audio context gets 318 WER. This indicates that the acft fine tuned model with dynamic audio context may not work well in real-world conditions which include diverse accents and varying speech conditions.

Answer 1 · 2024-07-05T19:06:06.000Z

Not sure why but changing ADD_AUDIO_CTX to 64 makes acft-whisper-tiny.en achieve 19 WER on earnings22.

Answer 2 · 2024-08-14T22:21:01.000Z

can u share which parameter needs to be set in whisper wparams.audio_ctx = 1500;
to use this model.