auspicious3000/SpeechSplit

make_metadata.py logic puzzle

leijue222 opened this issue · 7 comments

# use hardcoded onehot embeddings in order to be cosistent with the test speakers
# modify as needed
# may use generalized speaker embedding for zero-shot conversion
spkid = np.zeros((82,), dtype=np.float32)
if speaker == 'p226':
spkid[1] = 1.0
else:
spkid[7] = 1.0
utterances.append(spkid)

Hi,
I don't understand the code logic of this paragraph.
Shouldn't everyone's id be different? Why are there only two?
If I want to use VCTK 20 speakers to train, does this paragraph have to be modified?
Could you explain a bit of it to me?

  1. How to generate file like demo.pkl to inference? I generate train.pkl by make_metadata.py is wrong to inference. The two files are stored in different formats.
  2. How to generate *-P.ckpt model?

@leijue222

Hi,
I don't understand the code logic of this paragraph.
Shouldn't everyone's id be different? Why are there only two?
If I want to use VCTK 20 speakers to train, does this paragraph have to be modified?
Could you explain a bit of it to me?

I guess that is only for demo.
If you just only have 20 speakers, then you can just imitate things like:

if(speaker == 'p226'): 
    spkid[0] = 1.0
else if(speaker == 'p227'):
    spkid[1] = 1.0 
else if(speaker == 'p228'):
    spkid[2] = 1.0 
etc

Author use one-hot embedding as the speaker id
if you wanna change to zero-shot learning, you may need to change the one-hot vector to speaker embedding vector.
I still got some problem, however. lol
I'm still working on how to make my own demo.pkl, which I consider is used to validate.

@CYT823
Yes, I wanted to do zero-shot unsee(little data and no guarantee of quality) to seen(big data set and high quality) before.
I don't know how effective this project is. The demo provided by the author can't evaluate the cost.

ha. I don't know either, but the results on the demo web page sound great.
I'm still trying to make it to zero-shot.
Hope the result can be as great as the demo page result when I use speaker embedding instead of one-hot.(crossing fingers)

Please leave a message here if you get great results. I am currently scheduled to do other tasks.
Good luck!

@CYT823 did you manage to create pkl file for inference?

Hi @skol101,
I am sorry, but I'm no longer working on this project.

Besides, I don't think you really need a pkl file during the inference mode.
The pkl file, which is for validation, is using in training mode.
During inference, you just need to put two voices as input and get the result.