Could you please describe details of rhythm-only conversion ?
dbkest opened this issue · 1 comments
dbkest commented
I don't understand how to get alignment when the input(utterance) to the rhythm-encoder is different from inputs(utterance) to pitch/content-encoders. ps(I don't understand the implementation details of variant in Appendix B.3). thank you, sincerely.
auspicious3000 commented
you can understand it by reading the code for pitch conversion