Failure to extract the IPA from English Wiktionary
Opened this issue · 3 comments
Hi ! Thanks for your great work !
I succeeded extracting the IPA from the German Wiktionary using the example provided:
git clone https://github.com/bmilde/wiktionary_ipa_phoneme_lexicons cd wiktionary_ipa_phoneme_lexicons wget https://dumps.wikimedia.org/dewiktionary/latest/dewiktionary-latest-pages-articles-multistream.xml.bz2 bunzip2 dewiktionary-latest-pages-articles-multistream.xml.bz2 python3 make_lex.py -f dewiktionary-latest-pages-articles-multistream.xml -o de_ipa_lexicon.txt --remove-stress
However, I could not get the IPA from the English Wiktionary after editing the above example.
I guess that they might be an error on the "make_lex.py" script.
If the Python script is working correctly, could you please provide an example as provided for German about how to execute the script for the English version?
PS. A German IPA Dictionary was made for GoldenDict thanks to your code :
http://lingvodics.com/dics/details/6172/
Thank you !!!
What command did you use for the English dictionary? I haven't run this in years so not sure if it still works, but it should be someting like:
python3 make_lex.py -l en -f enwiktionary-latest-pages-articles-multistream.xml -o en_ipa_lexicon.txt --remove-stress
Note the extra "-l en" language parameter must be supplied, default is 'de'. What was the error you were getting?
Thanks for your kind reply. I used the suggested command:
python3 make_lex.py -l en -f enwiktionary-latest-pages-articles-multistream.xml -o en_ipa_lexicon.txt --remove-stress
Only 111 English words with their IPA are shown as results. However, the code works perfectly for German.
As I am not a programmer, I cannot know exactly what part of the source code is not working correctly.
_Anyway, thanks for publishing this code ! It was very helpful for my German studies ! I published a Dictionary for GoldenDict with 670.000 German words with their IPA ! :D
http://lingvodics.com/dics/details/6172/
I've fixed (among other things) English words extraction here:
https://github.com/hellpanderrr/wiktionary_ipa_phoneme_lexicons