Overtraining and Bias towards keywords

Question

Overtraining and Bias towards keywords

csetanmayjain opened this issue 2 years ago · 4 comments

csetanmayjain commented 2 years ago

Hi,

I'm facing the following issues using wekws:

It is overfitting after 2-4 epochs only. (Even train on hundreds, or thousands of hours of data).
High False positive. When there are many keywords like 20, it's confusing between those, and more bias towards keywords than Freetext (-1) class.
Confuse between similar sounding words, and predict freetext as a keyword.

Can you please suggest any solution for it?

Thanks

csetanmayjain commented 2 years ago

17

csetanmayjain commented 2 years ago

thanks

Answer 1 · 2022-12-05T01:52:15.000Z

how many keywords in your experiment?

Answer 2 · 2022-12-10T13:42:48.000Z

at present wekws can't handle so many keywords.
For the case where the false positive is high, one general solution is using more negative samples （make sure that the number of negative samples is at least 10 times the number of positive samples if you are strict with FAR）