cmusphinx/pocketsphinx

Recognized words differed of recognized phonemes.

timkarlo opened this issue · 3 comments

Recognized words differed of recognized phonemes.

Recognizing the words gives this output
pocketsphinx\bin\Release\x64>pocketsphinx_continuous.exe -infile "C:\1timkarlo\ai_short16000.wav" -hmm model\en-us\en-us -lm model\en-us\en-us.lm.bin -dict model\en-us\cmudict-en-us.dict -logfn nul

the question of how real is artificial intelligence is

recognizing the phone this out
pocketsphinx_continuous.exe -infile "C:\1timkarlo\ai_short16000.wav" -hmm model\en-us\en-us -allphone yes -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0 -logfn nul
SIL B IH V K AO R SH IH N EH V EH HH AW R R TH R IY OW L HH SIL EY IH Z SIL

the words in this file: cmudict-en-us.dict are
the DH AH
question K W EH S CH AH N
of AH V
how HH AW
real R IY L

is IH Z

Where this difference comes from?

This may have been due to the endianness bug in language models affecting the phone LM

Possibly, but allphone search gives different (and worse) results, because the language model is less powerful. This is, unfortunately, just the way it works.

Especially in this case, where you don't actually have a language model over the phonemes. Many things sound the same, especially to a not very good acoustic model!

You can try wav2vec if you want to get state of the art phoneme recognition. You might need a few more gigabytes though :-)