mashrurmorshed/Torch-KWT

Unknown class

hamzakhalil798 opened this issue · 2 comments

what if the audio is not present in the keywords? would the model recognize and print it as unknown? or something similar?

There are two ways this can be done:

  • First, the model can predict unknown if you simply add an 'unknown' class in your data, which would contain lots of random words and sounds which are not the main keywords.

    • In fact, that is what the Speech Commands V1 dataset did, they had total 12 classes: 10 keyword classes and 2 extra classes "unknown" and "silence".
    • This repository mainly works with speech commands V2 though, which is instead a 35 keyword problem and didn't have any data for "unknown" class in the dataset.
  • Since there isn't an "unknown" class in the data, we have to use an alternative: we can filter out unknowns by setting a confidence threshold. Simply put, when the model encounters audio that isn't a keyword, the softmaxed prediction should have comparatively lower values.

    • You can take a look at the colab tutorial and at window_inference.py where I tried implementing this. Given a long audio clip, we move a sliding window over the audio and run inference on 1 s audio chunks. If the confidence of the prediction is less than some threshold (for example, 0.8), we predict unknown and move on.

If you are working on your own custom keyword classification problem, I would suggest to build your own unknown class, as that should give better results.

@ID56 Got it! Thank you so much for your constant support.