fgnt/pb_sed

Training only with strong labels

Closed this issue · 2 comments

Hello,

similar to #8, could you please suggest me a workflow for training on a custom dataset, that only contains strong labelled data?

The readme seems to suggest to only train the bidirectional CRNN and simply ignore the FBCRNN, which makes sense to me, as I would assume the lack of bidirectionality in the FBCRNN only helps for weak labels.
However, the paper (DCASE challenge 2022) says, the FBCRNN can also be trained with strong labels simply adjusting the loss function.

What would you suggest?

Best Regards

Hi Niclas,

it depends a bit on the output resolution that you are aiming at I guess. The FBCRNN is better for tagging and therefore also for low resolution SED, where you perform tagging in larger windows of multiple seconds. It achieves, e.g., better PSDS2 performance than the BiCRNN. Although the FBCRNN can also perform SED by performing tagging in windows of <1s, the BiCRNN performs better if you want to locate events up to a few hundred milliseconds, as for example measured by PSDS1 and collar-based F-Score, which is probably even more the case when you have strong labels for all your data. However, note that for collar-based F-score (which is a single operating point evaluation) the tag-conditioned BiCRNN turned out to perform best, where you then also need the FBCRNN for tagging.

I hope that helps. Let me know if you have further questions.

Best,
Janek

Thanks for the useful information, I will try step by step and make use of the tagging functionality eventually.