lstrgar/self-supervised-phone-segmentation

pos_weight for reproducing unsupervised Buckeye & TIMIT

roger-tseng opened this issue · 3 comments

Hi Luke,

I've been trying to reproduce your unsupervised experiments, but I've only been able to get about 60% F1 so far, compared to your ~80% F1
May I ask what values for pos_weight you used for your unsupervised experiments with Buckeye and TIMIT?

Also, for Buckeye, do you re-split the audio recordings according to the psuedo-labels?

pos_weight was ~1.5 for unsupervised experiments.

What do you mean by re-split the audio recordings?

Thanks for the quick reply! I'll try it out.

About re-splitting, the Buckeye data is split into shorter segments at ground truth silence positions, which probably shouldn't be available in the unsupervised setup. So I was just wondering whether you redo the preprocessing w.r.t the pseudo-labels in some way.

You could generate splits with https://github.com/zhenghuatan/rVAD.