rapideditor/country-coder

ISO 639 Language Codes

Closed this issue · 7 comments

Heya,

Would you be open to accepting a PR to add 'Official Languages' per-region?

I hunted around for an official source & didn't find many decent ISO 3166 <> 639 mapping files.

This file from geonames contains a bunch of extra fields we could adopt, including the languages (which are correct from my spot checking):
http://download.geonames.org/export/dump/countryInfo.txt

Another option would be to use a locale dataset from a linux distro, I feel like they would be fairly complete and well maintained.

Please let me know if that's something you'd accept.

Maybe? I’m not sure how a language-per-region dataset would be useful in the context of OpenStreetMap editing?

Looping in @1ec5 too, as he knows more about this than I do.

I think that's a fair point, I'm not actually planning on using the library for Rapid (or OSM in many cases), but I find the simplified polygons quite useful and would prefer to collaborate on them rather than have another repo to maintain.

There are some other fields in that geonames download which might be helpful for some, but not for everyone.
If that's the case I could put these extra fields in separate files and have tree shaking remove what's not used.

Thanks! Sorry I commented kind of quickly before.. We can definitely add data even if it isn't used in Rapid or OSM - I'm more asking for clarification because it's probably not data that I would use myself.

"Spoken/official" languages was mentioned in #4 (comment) too.

Minh has a point about maybe needing more granular shapes in some areas, eg. Switzerland.

Screenshot 2023-09-06 at 15 09 45
1ec5 commented

I think any addition of language codes would need to come with a caveat emptor. Every use case requires a different mapping from countries to languages. Official languages don’t necessarily say anything about the name language in OSM, or the language that users in that country are searching in. iD maintains a mapping for determining the name:* fields to show by default, which started from CLDR but required some tweaks afterwards based on user feedback. This is a pretty limited use of the data, since users can easily customize the language list.

See #132 (comment) but I think for now I'm more comfortable steering people towards CLDR if they need a territory-language mapping.