The output of the conversion basically sounds like the input

Question

The output of the conversion basically sounds like the input

fazlekarim opened this issue 7 years ago · 3 comments

Hi,

I load the check point from the google drive. The output sounds (some samples I created of my voice) similar to the input. Is anything happening? I get this warning:

/home/fakarim/projects/Voice_Converter_CycleGAN/preprocess.py:170: RuntimeWarning: divide by zero encountered in log
f0_converted = np.exp((np.log(f0) - mean_log_src) / std_log_src * std_log_target + mean_log_target)

Do you think its not doing anything because of this warning?

Answer 1 · 2018-06-16T13:19:36.000Z

Hi fazlekarim,
The CycleGAN learns mapping from distribution A to distribution B, and distribution B to distribution A. This means that if you would like to change your own voice to some other voices, you would need to collect your own data and train the model.
Regarding the warning, this is because for some of the audio pieces (blank audios), there are fundamental frequencies of 0Hz, which will cause warnings in log Gaussian pitch conversion. But 0Hz will always be converted to 0Hz using the above formula. So the warning is harmless.
Best,
Lei

Answer 2 · 2018-06-16T13:24:28.000Z

Wow. Thank you for your explanation. Based on your experience, how much data do you think I would need for CycleGan to learn the mapping of the distribution of my voice into another distribution? I know this is a hard question.

Answer 3 · 2018-06-16T13:54:03.000Z

If you take a look at the VCC2016 dataset, you will find the training set for one person is several pieces of speech audio ranging from 1 second to around 4 seconds. If my memory is correct, the total length of the these audio is around 5 to 10 minutes. But it should be noted that the quality of those speeches are extremely high.