Google pronunciation

Question

Google pronunciation

Closed this issue 2 years ago · 3 comments

Does Google's "How to pronounce" service use a standard phonetic? And are you able to add that to this library?

Although they are not exactly the same, these commands gave similar results:

python lexconvert.py --phones unicode-ipa-syls "invisible"    # ɪnvˈɪzəbəl
python lexconvert.py --phones2phones unicode-ipa-syls festival-cmu "ɪnvˈɪzəbəl" # (((ih) 0) ((n v ih) 1) ((z ax) 0) ((b ax l) 0))

python lexconvert.py --phones unicode-ipa-syls "accomplish"  # əkˈɒmplɪʃ
python lexconvert.py --phones2phones unicode-ipa-syls festival-cmu "əkˈɒmplɪʃ" # (((ax) 0) ((k aa) 1) ((m p l ih sh) 0))

Answer 1 · 2022-11-28T23:27:54.000Z

Google introduced this in 2019 and didn't say where the data comes from. I'm not sure who pronounces "invisible" starting with “uhn” instead of “in”—that example gives me little confidence in the usefulness of this service. Disclosure: before Jonathan Duddington died I helped him with eSpeak, which the Google team used for Google Translate in 2010 but then replaced with commercial voices just a few months later which disappointed us: yes we had bugs, but the way to get those fixed wasn't to ditch us and switch to some closed-source proprietary machine-learning unit-selection system that made further debugging really difficult (it sounded fine if you gave it vocabulary from its training set, but not if you didn't). So I'm really not that confident about the dicection Google has gone in with regard to accuracy in language processing.

But yes I can add their format if all of the following apply:

we can figure out the definitive answer of what spellings correspond with which phonemes,
they're not going to change this very often, and
we can figure out how to represent emphasis, which is currently done by using bold type instead of a symbol.

Answer 2 · 2022-11-29T18:19:20.000Z

Thanks,

Currently, I am attempting to gather more data and will update this issue slowly and steadily.

Today I figured out two more services use an algorithm similar to Google's

Furthermore, I think the syllables break incorrectly when I convert the "unicode-ipa-syls" to the "festival-cmu"

python lexconvert.py --phones unicode-ipa-syls "accomplish"  # əkˈɒmplɪʃ
python lexconvert.py --phones2phones unicode-ipa-syls festival-cmu "əkˈɒmplɪʃ" # (((ax) 0) ((k aa) 1) ((m p l ih sh) 0))

here "əkˈɒmplɪʃ" converts to "(((ax) 0) ((k aa) 1) ((m p l ih sh) 0))" while all other services convert this word to "(((ax) 0) ((k aa m) 1) ((p l ih sh) 0))"

they're not going to change this very often, and

Although I found fault in Google's service, I believe the answer is YES

we can figure out how to represent emphasis, which is currently done by using bold type instead of a symbol.

Yes, I believe this is the easiest part.

Answer 3 · 2022-12-19T13:46:15.000Z

For the festival-cmu difference, I'm pretty sure Festival treats (((ax) 0) ((k aa) 1) ((m p l ih sh) 0)) and (((ax) 0) ((k aa m) 1) ((p l ih sh) 0)) identically: both of them map to ax0 k aa1 m p l ih0 sh, because emphasis numbers go to vowels only, and where you put the parentheses depends on your taste. But I haven't actually done the experiment of feeding both to festival-cmu and verifying that the resulting audio is byte-for-byte identical. Yes it would be nice to get "cosmetic" things like parenthesis placement done well, but it's a lower priority than making sure the phonemes and emphasis points are correct. Similarly, the --syllables option doesn't always put the hyphens in exactly the right places; one day I'd like to figure out how to improve that, but meanwhile it works well enough for most uses.