w3c/manifest

Description of language tags incorrect

Closed this issue · 0 comments

lang member
https://w3c.github.io/manifest/#lang-member

A language tag is a string that matches the production of a Language-Tag defined in the [BCP47] specifications (see the IANA Language Subtag Registry for an authoritative list of possible values). That is, a language range is composed of one or more subtags that are delimited by a U+002D HYPHEN-MINUS ("-"). For example, the 'en-AU' language range represents English as spoken in Australia, and 'fr-CA' represents French as spoken in Canada. Language tags that meet the validity criteria of [RFC5646] section 2.2.9 that can be verified without reference to the IANA Language Subtag Registry are considered structurally valid.

The above description has multiple issues. Rather than dissect them individually, I would suggest replacing the above with this paragraph:

A language tag is well-formed language tag consisting of a string that matches the production Language-Tag defined in BCP47. Note that language tags are case insensitive. Examples of language tags include fr (French), en-AU (English as spoken in Australia), or zh-Hans-CN (Chinese as written in the Simplified Han script as spoken in China).

Additional specification guidance on this topic can be found here: https://www.w3.org/TR/international-specs/#lang_values

There is a bit of complexity here: it is a good idea to only require implementations to check if a language tag is "well-formed" (i.e. it matches the ABNF and a few other requirements in BCP47), but to require users to use "valid" tags (i.e. tags that use subtags in the subtag registry and obey a few other requirements).

Finally, note that BCP47 can be referenced from SpecRef (that is, in ReSpec you can just use [[BCP47]])