NVIDIA/mellotron

Generic Text-to-Speech Inference

GreenGarnets opened this issue · 1 comments

I understood that Mellotron puts audio or musicXML on the result of synthesis based on Tacotron2 and gives StyleTransfer accordingly. By the way, if there is no reference file here, can't I just bring the general TTS composite result? I looked at the code section of model.py, but I'm asking because I don't think it's relevant.

Please re-phrase your question as it is not clear to me what your question is.