Rongjiehuang/GenerSpeech

Could not reproduce the result

Closed this issue · 1 comments

Thanks for the amazing work.
I followed the instruction to try generating Non-Parallel Transfer output, with reference audio from VCTK at the demo page (https://generspeech.github.io/#non-parallel-transfer, ref text: When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow.)
the expected output should be: https://generspeech.github.io/wavs/NonParallelTransfer/VCTK/GenerSpeech/001.wav
but this is what I got: https://drive.google.com/file/d/1yRiW6TRlUcwwbs4MlS33VCmDQmj0pVA2/view?usp=share_link

The command I used to run inference:

PYTHONPATH=. CUDA_VISIBLE_DEVICES=0 python inference/GenerSpeech.py --config modules/GenerSpeech/config/generspeech.yaml  --exp_name GenerSpeech --hparams="text='We also need a small plastic snake and a big toy frog for the kids.',ref_audio='vctk_001.wav'"

Did I miss something or do anything wrong?

Thanks for your interest. Since we re-train the checkpoint for code release, and thus the results should not be exactly the same as the one on the webpage. For this case you provide, it seems that the rhythm is faster, you may control the predicted duration for a better result.