facebookresearch/voxpopuli

Is there speaker annotations in unlabeled data?

DongChanS opened this issue · 2 comments

Thanks for data creation!

I have one question.

Is the speaker information only in the transcribed data?

if not, is there any unlabeled data that it have speaker information?

kahne commented

Thanks for checking with us.

Unfortunately, there is no metadata for the speakers in the unlabelled data. However, you may leverage speaker diarization/identification models to classify the speakers. The set of speakers is likely small (they are certificated interpreters). Also every speech (by speaker) is usually minutes long and speaker change is not very often.

kahne commented

I will close this issue for now. Please feel free to reopen or create a new one if you get more questions.