Inconsistent normalized values for some tags
vvanpo opened this issue · 4 comments
Example with Taiwanese:
$ node -e "console.log(require('bcp-47-normalize')('zh-Hans-TW'))"
zh-TW
$ node -e "console.log(require('bcp-47-normalize')('zh-TW'))"
zh-Hant
So if I'm understanding correctly what this program is supposed to do, it's telling me that zh-TW
is both the normal form of the tag that includes the 'Hans' script, and is "further normalized" down to the 'Hant' script?
@wooorm This fix will result in zh-CN
becoming zh
, and lots of other normalization change. Is there a reason for this? Should it be marked as a BREAKING CHANGE instead?
https://npm.runkit.com/bcp-47-normalize
var bcp47Normalize = require("bcp-47-normalize")
console.log(bcp47Normalize('zh-CN'));
console.log(bcp47Normalize('zh-TW'));
console.log(bcp47Normalize('zh-MO'));
console.log(bcp47Normalize('zh-HK'));
"zh"
"zh-Hant"
"zh-Hant-MO"
"zh-Hant-HK"
Yup, that’s the goal of normalizing. Chinese as spoken in China, well, the as spoken in China part is implied.
These four all go through here: https://github.com/unicode-org/cldr/blob/4b1225ead2ca9bc7a969a271b9931f137040d2bf/common/supplemental/supplementalMetadata.xml#L177
And then a couple of them are defaults: https://github.com/unicode-org/cldr/blob/4b1225ead2ca9bc7a969a271b9931f137040d2bf/common/supplemental/supplementalMetadata.xml#L1539
I’d normally consider it breaking, but the previous behavior was broken.