Question about data preparation with speech data alignment in speech matrix dataset
Closed this issue · 0 comments
tony-sensei commented
During data preparation of speech matrix, for aligned_speech tsv files, the files shown as:
score lt_audio sl_audio
1.121542 lt_aud.zip:5203779181:21527 sl_aud.zip:50446110544:22547
1.1027344 lt_aud.zip:132563238:14033 sl_aud.zip:3224296345:11940
1.1023445 lt_aud.zip:6292033729:49818 sl_aud.zip:17374011756:20890
which have different formats with the audio titles in raw audio folders for each language, for example in the folder audios/lt/, there is:
ls | head -n 5
20090112-0900-PLENARY-10_lt_1079616_1086270.ogg
20090112-0900-PLENARY-10_lt_1133568_1136670.ogg
20090112-0900-PLENARY-10_lt_1238304_1242270.ogg
20090112-0900-PLENARY-10_lt_1288704_1292862.ogg
20090112-0900-PLENARY-10_lt_1288704_1296606.ogg
So how do these two formats align with each other? I thought they could somehow be the same number pairs, but there are actually not.
Could anybody help? Thank you so much!