facebookresearch/fairseq

Question about data preparation with speech data alignment in speech matrix dataset

Closed this issue · 0 comments

During data preparation of speech matrix, for aligned_speech tsv files, the files shown as:

score	lt_audio	sl_audio
1.121542	lt_aud.zip:5203779181:21527	sl_aud.zip:50446110544:22547
1.1027344	lt_aud.zip:132563238:14033	sl_aud.zip:3224296345:11940
1.1023445	lt_aud.zip:6292033729:49818	sl_aud.zip:17374011756:20890

which have different formats with the audio titles in raw audio folders for each language, for example in the folder audios/lt/, there is:

ls | head -n 5
20090112-0900-PLENARY-10_lt_1079616_1086270.ogg
20090112-0900-PLENARY-10_lt_1133568_1136670.ogg
20090112-0900-PLENARY-10_lt_1238304_1242270.ogg
20090112-0900-PLENARY-10_lt_1288704_1292862.ogg
20090112-0900-PLENARY-10_lt_1288704_1296606.ogg

So how do these two formats align with each other? I thought they could somehow be the same number pairs, but there are actually not.

Could anybody help? Thank you so much!