wenet-e2e/wekws

Overtraining and Bias towards keywords

csetanmayjain opened this issue · 4 comments

Hi,

I'm facing the following issues using wekws:

  1. It is overfitting after 2-4 epochs only. (Even train on hundreds, or thousands of hours of data).
  2. High False positive. When there are many keywords like 20, it's confusing between those, and more bias towards keywords than Freetext (-1) class.
  3. Confuse between similar sounding words, and predict freetext as a keyword.

Can you please suggest any solution for it?

Thanks

how many keywords in your experiment?

at present wekws can't handle so many keywords.
For the case where the false positive is high, one general solution is using more negative samples (make sure that the number of negative samples is at least 10 times the number of positive samples if you are strict with FAR)

thanks