p-groarke/wsay

How does speech XML work (which format)?

caydenmascarenhas opened this issue · 3 comments

When you said we could use speech xml i was quite happy as I wanna do stuff like add 5 second pauses, etc. However, I'm not sure which speech XML format this uses - I've tried the regular MacOS speech markup (with [[slnc 5000]] for a 5 second pause), the SAPI TTS XML (), and the SSML markup ().
All of these just make it read the tag out instead of pause for 5 seconds.

Hey BigFrog, the xml it supports is Microsofts own https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms717077(v=vs.85)

Haven't tested it in a while though. Let me know if you have any issues with.

edit : I just noticed you tried SAPI. I'll give it a shot when I have some time to debug.

@BigFrogWithHat Just a quick update, it seems the xml doesn't work when calling from command line. But when reading an input text file, it's working here.

Are you experiencing the same?

edit : You can try with a text file with this xml. To use text files, wsay -i test.txt

<volume level="50">test</volume><volume level="100">test</volume>

OK well, I updated the readme to specify speech xml only works in text file mode. I also added a test you could run to troubleshoot https://github.com/p-groarke/wsay/blob/master/tests/data/SAPI.txt

If this still doesn't work for you, please re-open this ticket. Though it may be out of my hands (might be a windows / Microsoft issue).

Good day