NVIDIA/mellotron

Help needed w.r.t. inference

nirmeshshah opened this issue · 0 comments

Hi I have few doubts ,

  1. Is example1.wav the reference audio file whose style has to be captured while synthesizing samples in inference.ipynb? Do I need to have text and corresponding wav file for inference in mellotron well in advance ?. Usually, I have text which I want to synthesized and reference audio of completely different utterance whose style has to be captured. I am unable to map this with the existing inference.ipynb. Can anyone please give some more clarity on this ?

  2. How to run this model as a standalone TTS ?

  3. If I have trained my model on single speaker data how can I update Define Speaker Set section in inference.ipynb as it seems it is given for multispeaker only.