GhostNaN/silero-webui

Invalid XML format

stuta opened this issue · 5 comments

When I give text: <p>When I wake up, I speak quite slowly. Then I start speaking in my normal voice</p> I get ValueError: Invalid XML format in the server.

If I remove the dot then it works. Also, all XML like <prosody rate="x-slow"> fails.

When I give text: <p>When I wake up, I speak quite slowly. Then I start speaking in my normal voice</p> I get ValueError: Invalid XML format in the server.

What are you doing here?
Why is your input a HTML paragraph?

If I remove the dot then it works. Also, all XML like <prosody rate="x-slow"> fails.

Works fine for me in the web browser.
Why are you setting that manually here?

I'm trying to use SSML: https://github.com/snakers4/silero-models/wiki/SSML.
Maybe UI should have a separate option for SSML.

Okay, I found the issue was with the tts_preprocessor.py
It was replacing " with nothing: string = string.replace('"', '')
I also got rid of the system to break sentences in pieces for better consistency.
493bc59

Let me know if you have any other issues.

I tested with:

<speak>
    <p> When I wake up, <prosody rate="x-slow">I speak quite slowly</prosody>. Then I start speaking
        in my normal voice, <prosody pitch="x-high"> and I can speak in a higher tone </prosody>, or <prosody
            pitch="x-low">on the contrary, lower</prosody>. Then, if I’m lucky, <prosody rate="fast">I
        can speak quite quickly like now.</prosody> I can also make pauses of any length, for
        example, two seconds <break time="2000ms" />. <p>
            I also know how to pause between paragraphs.
        </p>
                    <p>
            <s>And I also know how to pause between sentences</s>
            <s>For example, like now</s>
        </p>
    </p>
</speak>

Error .multi_acc_v3_package.py:196: UserWarning: Current model doesn't support SSML tag: speak.

When I remove <speak> tags I get an error: .multi_acc_v3_package.py", line 113, in process_ssml raise ValueError(f"Failed to parse SSML: {e}") ValueError: Failed to parse SSML: invalid literal for int() with base 10: 'two thousand'.

I realize I'll would be fighting with the text preprocessor way to much here.
So I just added a setting to bypass: "Raw Mode"
3409e40

Although the quality will be down for handling things like saying numbers or individual letters