make_metadata.py logic puzzle
leijue222 opened this issue · 7 comments
Lines 17 to 25 in 10ed8b9
Hi,
I don't understand the code logic of this paragraph.
Shouldn't everyone's id be different? Why are there only two?
If I want to use VCTK 20 speakers to train, does this paragraph have to be modified?
Could you explain a bit of it to me?
- How to generate file like
demo.pkl
to inference? I generatetrain.pkl
bymake_metadata.py
is wrong to inference. The two files are stored in different formats. - How to generate
*-P.ckpt
model?
Hi,
I don't understand the code logic of this paragraph.
Shouldn't everyone's id be different? Why are there only two?
If I want to use VCTK 20 speakers to train, does this paragraph have to be modified?
Could you explain a bit of it to me?
I guess that is only for demo.
If you just only have 20 speakers, then you can just imitate things like:
if(speaker == 'p226'):
spkid[0] = 1.0
else if(speaker == 'p227'):
spkid[1] = 1.0
else if(speaker == 'p228'):
spkid[2] = 1.0
etc
Author use one-hot embedding
as the speaker id
if you wanna change to zero-shot learning, you may need to change the one-hot vector to speaker embedding vector.
I still got some problem, however. lol
I'm still working on how to make my own demo.pkl
, which I consider is used to validate.
@CYT823
Yes, I wanted to do zero-shot
unsee(little data and no guarantee of quality) to seen(big data set and high quality) before.
I don't know how effective this project is. The demo provided by the author can't evaluate the cost.
ha. I don't know either, but the results on the demo web page sound great.
I'm still trying to make it to zero-shot.
Hope the result can be as great as the demo page result when I use speaker embedding instead of one-hot.(crossing fingers)
Please leave a message here if you get great results. I am currently scheduled to do other tasks.
Good luck!
Hi @skol101,
I am sorry, but I'm no longer working on this project.
Besides, I don't think you really need a pkl file
during the inference mode.
The pkl file, which is for validation, is using in training mode.
During inference, you just need to put two voices as input and get the result.