Does introducing noise-only samples in training reduce hallucinations?

Question

Does introducing noise-only samples in training reduce hallucinations?

Closed this issue 2 months ago · 6 comments

I would like to know if this approach is truly effective in mitigating hallucinations in the Whisper model.

Answer 1 · 2024-09-15T13:04:53.000Z

Yes, it helps quite a bit, but it does not completely eliminate all hallucinations. I am currently exploring different approaches to make the trained cross-attention heads more effective at detecting hallucinations in a robust way.

Since these attention heads were actually trained, I would expect them to exhibit some "unusual" behavior, such as having increased entropy in their cross-attention distribution when hallucinated content is predicted.

One simple heuristic that could be implemented on top of the current model is this: if a sequence of words has a very short duration (as indicated by the timestamps), these words are likely hallucinated.

If you come across audio where the model starts hallucinating, I would be very interested in seeing those clips! :)

Answer 2 · 2024-09-18T03:01:37.000Z

I wanna fine-tune original Whisper model using my own dataset with noise-only samples to reduce hallucinations. Is this possible?

Answer 3 · 2024-09-18T08:16:24.000Z

Yes this is certainly possible :)

You will have to be a careful tough and add a meaningful amount of additional data in the language(s) you are interested in to not degrade the performance of the base model. Happy tuning!

Answer 4 · 2024-09-20T06:27:37.000Z

First, I used the aishell corpus to fine-tune whisper, and for noise data, I used the FSDnoisy18k Dataset and random Gaussian noise.
I randomly selected noise from noise data, added it to the original speech data, and used it to generate a noise-only sample. Is that OK?
Second, do I need to use the same audio files from AphasiaBank to validate hallucination mitigation? Are there any other methods?

Answer 5 · 2024-09-20T07:39:50.000Z

I randomly selected noise from noise data, added it to the original speech data, and used it to generate a noise-only sample. Is that OK?

Not quite, noise only samples contain no speech. Therefore adding noise with speech will not result in a noise only sample.

Please carefully study section three, especially 3.2 of the paper. The details are given there.
https://arxiv.org/pdf/2408.16589

Answer 6 · 2024-09-20T07:43:44.000Z

Sorry, I mean, I used noise data to generate a noise-only sample.