Inference without rhythm and pitch
kngan43 opened this issue · 0 comments
kngan43 commented
Hi,
I'm new to speech synthesis. I've trained my model on the emovdb dataset and want to do inference using the GST part of mellotron. I want to input any text and have it output speech with a certain emotion.
I noticed on issue#20 that someone mentioned the rhythm and pitch created a 1:1 aligment. Can someone explain more in detail about how to do inference without rhythm and pitch?