huawei-noah/Speech-Backbones

mels_mode generation

Biyani404198 opened this issue · 1 comments

Hi,
I have created TextGrid files in the subfolder textgrids using MFA.
Im facing issues to get average voice mel-spectrograms in the subfolder mels_mode.
Im using get_avg_mels.ipynb jupyter noteboook to get average voice mel-spectrograms.
Its generating mels_mode dictionary with phonemes as keys. But there is not further instructions to map them with spakers and create mels_mode subfolder using this dictionary.
@ivanvovk @ytyeung @wenyong-h @huawei-noah-admin @zhangjiajin2 Pls help.

for p in phoneme_list: mels_mode[p] = mode(np.asarray(mels_mode_dict[p]), 0).mode[0] lens[p] = np.mean(np.asarray(lens_dict[p]))

Basically, for each audio file .wav you know which frame corresponds to which phoneme (you can extract this information from textgrid file by calculating start_frame and end_frame as in get_avg_mels.ipynb), and then for each frame replace mel feature in _mel.npy file with the average feature of the corresponding phoneme -- mels_mode dictionary contains mapping {phoneme: its average mel feature}.