eisoch/irg

RS issues

eisoch opened this issue · 10 comments

UCS Block Char Ref. Current RS Correct RS Variant Source
U+3B3A ExtA K3-275E 74.10 74.9 IRGN2239
U+2575E ExtB 𥝞 TF-2842 115.3 59.5 IRGN2269EisoReview,異體字字典A01271-004
U+2A741 ExtC 𪝁 K5-005E 9.7 30.6
U+2A80F ExtC 𪠏 GZFY-00829 27.9 107.6 𥀬 广州话词典_P55
U+2B0D7 ExtC 𫃗 GZFY-01010 119.16 178'.18 𩏷
U+2B180 ExtC 𫆀 TC-3558 128.2 26.6 IRGN1232P1
U+2B385 ExtC 𫎅 TC-3623 152.1 1.7,7.6 亟,焏 IRGN1232P1,佛教難字字典_P7
U+2C1AE ExtE 𬆮 UTC-00068 79.11 196'.10
U+2D495 ExtF 𭒕 USAT-60078 38.11 30.11

(To be continued....)

UCS Block Char Ref. CurrentRS CorrectRS Variant Source
U+2A741 ExtC 𪝁 K5-005E 9.7 30.6
U+2B180 ExtC 𫆀 TC-3558 128.2 26.6 IRGN1232P1
U+2B385 ExtC 𫎅 TC-3623 152.1 1.7,7.6 亟,焏 IRGN1232P1,佛教難字字典_P7

To investigate:
image
image

3977和22283按康熙归部

頂...

UCS Block Char Ref. CurrentRS CorrectRS Variant Comment
U+5954 BMP ALL 37.6 37.5 FA7F Current RS refers to the FA7F Kangxi glyph, not the reference glyphs.
U+5ED9 BMP ALL 53.12 53.11 FA83 Current RS refers to the FA83 Kangxi glyph, not the reference glyphs.
U+6452 BMP J1-405C, K2-365A 64.9 64.9; 64.11 FA8F J1-405C and K2-365A represent the 'normative' Kangxi glyph, which is 11 strokes. However, the G, H, T source glyphs are 9 strokes.

Note: stroke count for old characters follow the Kangxi count, and new characters follow the IRG stroke count rules. The treatment of existing characters is currently undefined (but a solution should be sought out).

by 'old' characters, you mean the Unicode 1.1 BMP set?
what does the 'IRG stroke count rules' refer to? (sorry, I am a newcomer to this documentation.)

as far as I know, 5954 and 5ED9 have never had earlier reference glyphs (in any region) that correspond to the Kangxi stroke count.

note that 摒 6452 is listed as 9 strokes (which is correct for G, H, T); but the Kangxi stroke count is actually 11 (J, K). so, if RS is supposed to list the Kangxi stroke count for old characters, then UniHan is inconsistent on this issue.
image

New Extensions to the CJK Unified Ideographs are handled by IRG, a subgroup of ISO/IEC JTC1/SC2/WG2. You may refer to the IRG PnP which specifies the documents to refer to for stroke count.

For old characters (URO to Extension F), supposedly they follow the rule in The Unicode Standard which will use the stroke count in Kangxi Dictionary as priority, then that from the Morahashi dictionary, then Hanyu Dazidian, then from a specific Korean dictionary, even if none of the representative glyphs use the Kangxi shape.

Unfortunately Kangxi is not consistent in its stroke count methodology.

I think Unihan has some work in progress which intends to change the field to reflect the actual stroke count. But I'm not of its status.

New Extensions to the CJK Unified Ideographs are handled by IRG, a subgroup of ISO/IEC JTC1/SC2/WG2. You may refer to the IRG PnP which specifies the documents to refer to for stroke count.

For old characters (URO to Extension F), supposedly they follow the rule in The Unicode Standard which will use the stroke count in Kangxi Dictionary as priority, then that from the Morahashi dictionary, then Hanyu Dazidian, then from a specific Korean dictionary, even if none of the representative glyphs use the Kangxi shape.

Unfortunately Kangxi is not consistent in its stroke count methodology.

Thank you for the pointers. The only IRG document I've looked at in detail is IRG N2107R2, which is about the UK glyphs for ExtG.