neosapience/editts

How to use voice files instead pure TTS?

Vadim2S opened this issue · 4 comments

In papers you say about LJ speech dataset test (4.3 Content replacement). Can you provide code for loading voice files instead pure sample generation in tts.py?

Hello @Vadim2S, thanks for opening this issue, and apologies for the belated reply. The modeling code is a direct adaptation of Grad-TTS, so you can refer to the upstream repository for detailed instructions on how to load data. Hope this helps!

I have this question as well. Looking at the inference code, it is unclear how I could drop-in replace running Grad-TTS with my own source WAV file. Any tips would be appreciated :) Or if you have pointers to any other methods of doing something similar.

Hey @mvoodarla, thanks for opening this issue.

how I could drop-in replace running Grad-TTS with my own source WAV file

Could you explain in more detail what you mean by this? I assume this is in the context of content replacement.

Yes, I would like to have the ability to submit a source wav file with or without a text transcription of it, and be able to replace a word or a set of words that was said in that source wav file. Does that make sense?