WordPress/book

i18n: ISO 639-X

franz-josef-kaiser opened this issue · 6 comments

There are several ISO639 levels. As lvl 3 is broken (inconsistent), lvl 5 is unfinished and lvl 1 only contains country codes, I'd recommend to change the README to use lvl 2. The ISO 639-2 uses language_COUNTRY codes and therefore fits for differences like German in Austria, Switzerland, Germany, etc. as well.

Sure thing. Does this sound right?

When the book is ready for production and no more changes are expected, we would welcome translations. To translate the book, please create a sub-directory of the project, giving it the correct ISO639-2 code: language_COUNTRY (for example, pt_BR for Brazilian Portuguese), and submit a pull request.

Is there a page somewhere that lists all of the correct codes?

I think the book should follow what WordPress does. That is, we use ISO 639-3 when setting up most new locales. (The exception is, for example, varying Spanish locales or French locales and the like, which will continue to use ISO 639-1 adding on the country code.)

It would be easiest to just copy what's done by the polyglots team directly and direct people to that resource when determining their locale. See here for what they've done with each locale. Note that there are a lot of old ISO 639-1 codes with country codes added. We haven't been going back and fixing those issues.

I'd also mention that the first comment here is incorrect. ISO 639-2 explicitly does not have country codes, but uses three letter codes like ISO 639-3, just for a subset of languages. The link given in the first comment also links to this resource which shows the three letter language codes that it's based on.

@smckeown that sounds perfect.

Is there a page somewhere that lists all of the correct codes?

Yes, the wecodemore/ISO639-2 repo and it's .json files. As mentioned in the README, we fetch the list from the original source, the library of the national congress which is the maintaining authority and has the only valid list of codes.

I think the book should follow what WordPress does

Don't jump out of the window because everyone does.

I'd also mention that the first comment here is incorrect. ISO 639-2 explicitly does not have country codes, but uses three letter codes like ISO 639-3, just for a subset of languages.

@samuelsidler That's actually true. At least partly: I've been wrong with "The ISO 639-2 uses language_COUNTRY codes" - it should have read WordPress, but more on that below.

The link given in the first comment also links to this resource which shows the three letter language codes that it's based on.

The truth is, that WP uses language_COUNTRY (example: de_DE, de_CH, etc.). The first part of the string is what the IS0 639-2/3 can deliver. The second part is what the ISO 3166 can deliver: Standardized country codes. The problem with preferring the ISO 639-3 over the -2 variant is, that the 3-variant has reserved code blocks, which come as porting problem/left overs from the 2-variant. Fact is, that the ISO 639-3 contains all non-collective elements from -2 and introduces historical, ancient and constructed languages on top of that as well as living languages, divided per definition of dialect or language by individual decision. And that last bit makes it highly broken as most people who speak a language don't necessarily agree with someone deciding that it's a dialect, etc. And as long as the -5/-6 variants are in development, we should use the -2 variant as standard and simply ignore the specifics as we gain nothing from trying to use a "fix" which is none when we already have country codes as suffix which account for regional differences in language groups/collections. That standard is working as it is the smallest common identifier with a clear distinction by suffix.

@franz-josef-kaiser I understand what you're saying, but it's going against what we do in the WordPress project and using ISO 639-2 wouldn't cover languages that WordPress is translated in.

Just a quick glance reveals that Southern Balochi (translation started), Hazagari (fully translated), Kurdish Sorani (mostly translated), Rohingya (half translated), and South Azerbaijani (translation started) are all at various stages in their translation of WordPress and not covered in ISO 639-2. Notice that one of those is fully translated and has shipped WordPress 4.1.1.

We can argue about whether ISO 639-3 should be a standard or not, but it is a standard. ISO 639-5 is also a standard (not just in development), but doesn't cover the wealth of languages that WordPress is available in, rather it covers language families. ISO 639-6 may be "in development" again, but it was officially withdrawn in 2014 as an ISO standard. WordPress will stick with ISO 639-3. I think the book should as well.

Have no strong feeling about this. Whatever fits :)

Leaving this open until I've updated the readme. Otherwise I'll forget.