Some language codes not recognized by iso639.Language.match()
sonofthomp opened this issue · 4 comments
Running codes.py yielded the following error:
(base) gabrielthompson@Gabriels-MBP-2 lib % python codes.py
codes.py WARNING: WikiPron resolves the key 'ain' to 'Ainu (Japan)' listed as 'Ainu' on Wiktionary
codes.py WARNING: WikiPron resolves the key 'rup' to 'Macedo-Romanian' listed as 'Aromanian' on Wiktionary
codes.py WARNING: WikiPron resolves the key 'bjb' to 'Banggarla' listed as 'Barngarla' on Wiktionary
Traceback (most recent call last):
File "/Users/gabrielthompson/Desktop/Coding/research/wikipron3/data/scrape/lib/codes.py", line 215, in <module>
main()
File "/Users/gabrielthompson/Desktop/Coding/research/wikipron3/data/scrape/lib/codes.py", line 177, in main
iso639_lang = iso639.Language.match(wiktionary_code)
File "/Users/gabrielthompson/anaconda3/lib/python3.10/site-packages/iso639/language.py", line 120, in match
return _get_language(user_input, query_order)
File "/Users/gabrielthompson/anaconda3/lib/python3.10/site-packages/iso639/language.py", line 189, in _get_language
raise LanguageNotFoundError(
iso639.language.LanguageNotFoundError: 'gmw-cfr' isn't an ISO language code or name
For whatever reason, the iso639 module isn't recognizing some of the language codes from the wiktionary API. Someone should look into why this is, or maybe omit languages that don't have valid language codes.
I hadn't seen that fatal exception before. I think we should probably catch it and convert it to a warning. What do you think?
Giving a warning sounds like a good idea. In place of iso639_lang = iso639.Language.match(wiktionary_code)
, I'm thinking something like:
try:
iso639_lang = iso639.Language.match(wiktionary_code)
except iso639.language.LanguageNotFoundError:
logging.warning(
"Could not find language with code %s", wiktionary_code
)
... so that in the case of gmw-cfr
, the following is outputted:
codes.py WARNING: Could not find language with code gmw-cfr
codes.py WARNING: Could not find language with code gmw-cfr
This proposal LGTM.
Closed in #499, I believe.