Not detecting exact number of speakers

Question

Not detecting exact number of speakers

Closed this issue 3 years ago · 1 comments

In 45min it is detecting 5 speakers, in ground truth it have 8-10 speakers. Speakers for which voice activity is less (less than 5sec) are given labels of other.
I have not used rttm file.
Replaced audio & vad file in examples sub directory, to test performance.
Why number of speaker is problem?

Answer 1 · 2022-06-01T11:32:44.000Z

Hi Manish,
I am not sure exactly what you are trying to do. My guess is that you are running the code on a recording 45 minutes long that has many speakers. For starters, the model is not perfect and might make mistakes (such as not finding those speakers that speak very little). But besides that, you can try slightly different hyperparameters of the model: Fb has some influence on the amount of speakers that the model can find. So, for example using lower values will make the model tend to find more speakers. You can take a look at the hyperparameters in the DIHARD recipe and try those or some other slightly smaller values for Fb such as 5 or 4. Still, take into account that the model can still fail in finding all speakers, specially in difficult cases as a speaker that barely speaks in a long recording.
I hope this helps.