EthicalSource/contributor_covenant

Language Tags

TimidRobot opened this issue · 1 comments

1) Use IETF BCP 47 language tags instead of ISO 629-2 language codes

The documentation recommends ISO 629-2 language codes:

1. If it's a new language, add it to `config.toml`,
with a localized name and language code/optional region (e.g. `pt` or `pt-br`). See the [list of ISO 639-2 language codes here](https://www.loc.gov/standards/iso639-2/php/code_list.php)

However, I believe those are technically insufficient. Instead I recommend using a IETF BCP 47 language tag. Thankfully, it is based on ISO 629-2 (no changes necessary). It also provides additional information, when needed. For example, If there is translation into Serbian (ISO 629-2 language code sr, you need to specify whether the Latin or Cyrillic is used--sr-latn or sr-cyrl)

IETF language tag - Wikipedia:

To distinguish language variants for countries, regions, or writing systems (scripts), IETF language tags combine subtags from other standards such as ISO 639, ISO 15924, ISO 3166-1 and UN M.49.

RFC 5646 - Tags for Identifying Languages provides a public specification.

2) Documentation leave case ambiguous

The configuration file (config.toml (permalink)) currently only contains lowercase language codes with the exception of: fa-IR فارسی (ایران) [Persian (Iran)]. To prevent confusion and unnecessary redirects, I recommend explicitly stating that lowercase language tags should be used.

3) Region vs Script

(I have the least confidence in this last recommendation.) It is my understanding that script codes better serve the global community than region codes (ex. zh-cn ➡️ zh-hans and zh-tw ➡️ zh-hant).

Additional context for 3) Region vs Script: #18419 (Language code is not correct for Chinese) – Django