JSON UniMorph morphology: make one-to-many
kylebgorman opened this issue · 0 comments
kylebgorman commented
Once #372 by @reubenraff is submitted, we should expand the JSON file and associated logic so that the mapping from WikiPron language codes to UniMorph URLs is one-to-many. This will allow us to deal with the fact that fin
(Finnish) is two files.
- Instead of
Dict[str, str]
make the UniMorph JSON aDict[str, List[str]]
instead. Most languages will only have one entry in the list. - In the
grab_unimorph_data.py
script, loop over the list of URLs for each language, writing all of them into the WikiPron language code +.tsv
. That'll give us a singlefin.tsv
file (for instance).
We could make it the case that the dictionary values are polymorphic (Union[str, List[str]]
) but I think that'd just make things more confusing.