How does speech XML work (which format)?

Question

How does speech XML work (which format)?

caydenmascarenhas opened this issue 2 years ago · 3 comments

When you said we could use speech xml i was quite happy as I wanna do stuff like add 5 second pauses, etc. However, I'm not sure which speech XML format this uses - I've tried the regular MacOS speech markup (with [[slnc 5000]] for a 5 second pause), the SAPI TTS XML (), and the SSML markup ().
All of these just make it read the tag out instead of pause for 5 seconds.

Answer 1 · 2022-08-27T20:18:03.000Z

Hey BigFrog, the xml it supports is Microsofts own https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms717077(v=vs.85)

Haven't tested it in a while though. Let me know if you have any issues with.

edit : I just noticed you tried SAPI. I'll give it a shot when I have some time to debug.

Answer 2 · 2022-09-03T20:40:17.000Z

@BigFrogWithHat Just a quick update, it seems the xml doesn't work when calling from command line. But when reading an input text file, it's working here.

Are you experiencing the same?

edit : You can try with a text file with this xml. To use text files, wsay -i test.txt

<volume level="50">test</volume><volume level="100">test</volume>

Answer 3 · 2022-09-10T17:21:11.000Z

OK well, I updated the readme to specify speech xml only works in text file mode. I also added a test you could run to troubleshoot https://github.com/p-groarke/wsay/blob/master/tests/data/SAPI.txt

If this still doesn't work for you, please re-open this ticket. Though it may be out of my hands (might be a windows / Microsoft issue).

Good day