auspicious3000/AutoPST

test_vctk.meta

jardnzm opened this issue · 5 comments

Hi,

I am wondering how the "test_vctk.meta" is created in the demo file?

Thanks!

Just put the cepstrum and speaker embedding into a list.

Thanks! By speaker embedding, I guess you mean the one hot representation? And does it indicate that the speaker that we "transfers to" need to be seen in the training data?
One more question, is the output quality restricted by the input length ? The demo audios are all like 1s-2s. How does it performs on 5s-10s audios?

Yes.

No.

Just put the cepstrum and speaker embedding into a list.

Hi, is cepstrum computed by one of these three in prepare_train_data.py? If so can you point
me which one is it? If not, are there any codes in the github reflect this computation?
(https://github.com/auspicious3000/AutoPST/blob/main/prepare_train_data.py)
np.save(os.path.join(targetDir_cd, subdir, fileName[:-4]), codes.cpu().numpy(), allow_pickle=False) np.save(os.path.join(targetDir_sp, subdir, fileName[:-4]), S.astype(np.float32), allow_pickle=False) np.save(os.path.join(targetDir_cep, subdir, fileName[:-4]), cc_norm.astype(np.float32), allow_pickle=False)

cep stands for cepstrum