bmilde/wiktionary_ipa_phoneme_lexicons

Failure to extract the IPA from English Wiktionary

Opened this issue · 3 comments

Hi ! Thanks for your great work !

I succeeded extracting the IPA from the German Wiktionary using the example provided:

git clone https://github.com/bmilde/wiktionary_ipa_phoneme_lexicons cd wiktionary_ipa_phoneme_lexicons wget https://dumps.wikimedia.org/dewiktionary/latest/dewiktionary-latest-pages-articles-multistream.xml.bz2 bunzip2 dewiktionary-latest-pages-articles-multistream.xml.bz2 python3 make_lex.py -f dewiktionary-latest-pages-articles-multistream.xml -o de_ipa_lexicon.txt --remove-stress


However, I could not get the IPA from the English Wiktionary after editing the above example.

I guess that they might be an error on the "make_lex.py" script.

If the Python script is working correctly, could you please provide an example as provided for German about how to execute the script for the English version?

PS. A German IPA Dictionary was made for GoldenDict thanks to your code :
http://lingvodics.com/dics/details/6172/

Thank you !!!

What command did you use for the English dictionary? I haven't run this in years so not sure if it still works, but it should be someting like:

python3 make_lex.py -l en -f enwiktionary-latest-pages-articles-multistream.xml -o en_ipa_lexicon.txt --remove-stress

Note the extra "-l en" language parameter must be supplied, default is 'de'. What was the error you were getting?

Thanks for your kind reply. I used the suggested command:
python3 make_lex.py -l en -f enwiktionary-latest-pages-articles-multistream.xml -o en_ipa_lexicon.txt --remove-stress

Only 111 English words with their IPA are shown as results. However, the code works perfectly for German.

As I am not a programmer, I cannot know exactly what part of the source code is not working correctly.

_Anyway, thanks for publishing this code ! It was very helpful for my German studies ! I published a Dictionary for GoldenDict with 670.000 German words with their IPA ! :D
http://lingvodics.com/dics/details/6172/

I've fixed (among other things) English words extraction here:
https://github.com/hellpanderrr/wiktionary_ipa_phoneme_lexicons