NVIDIA/mellotron

Question about custom dataset

LucasRotsen opened this issue · 2 comments

Hi everyone!

Firstly, thank you for the great implementation.

I haven't understood yet how should I prepare my data for training, so I'd appreciate if someone clarifies that for me. My assumptions are:

  • If I have data from 10 speakers, I should divide it into 2 files in the "filelist" directory (train and val)
  • Each of those files should contain a representative sample of all speakers
  • The txt file format should be: path_to_audio|transcripts|speaker_id

Are my assumptions correct?

Yes, that's a good start!
Make sure you trim silences at the beginning and end of each of the audio files and the transcript matches the audio file.

Thanks for the quick reply, @rafaelvalle !