Wrong content-only inversion results for replacing con_input from source and target with different sentences

Question

Wrong content-only inversion results for replacing con_input from source and target with different sentences

jixinya opened this issue 3 years ago · 1 comments

Hi, thanks for your great work! I find that no results and code for content-only inversion are provided in the website and 'demo.ipynb'. So I add one condition in 'demo.ipynb' by replacing inp_con with tgt_con and test with source and target with different content.
However, I found that the content of the generated speech was not converted. Instead, the content was converted by replacing rhythm (condition 'R').
I wonder if there is anything wrong with my experiments. Looking forward to your reply.

Answer 1 · 2022-08-21T21:37:21.000Z

Content-only conversion was not studied in either SpeechSplit or SpeechSplit2. This is indeed a more challenging task than the other three because there is a mismatch between the number of blanks provided by the rhythm code and the number of words/syllables. In my opinion this could be a intriguing research topic in the future though.