Test Character Cases
Closed this issue · 1 comments
Special cases of characters in the Etymology table
Case 𪛖 extension B character not in Etymology table
Input works OK
!Crashes on search
Case 𠀀 Extension B character in Etymology table
Found and displayed correct data on search
Case 㐦 Extension A character not in Etymology table
!Crashes on search
Case 䜩 Simplified character in Etymology table - OK
Case 讌 Traditional character in Etymology table - OK
Case 孃 Old Traditional OK
Case ㇏ Some “Chinese characters” come from Unicode code ranges other than Unihan, Extension A and Extension B, for example “strokes”
I have included a list of Unicode ranges which should be covered
CJK Strokes: (U+31C0 to U+31EF)
Not detected as legitimate input character.
Case Simplified column starting with “p”
I see you took care of the references to characters in the Simplified column starting with “p”
“p” means “part of a character”
I have other special characters such as Cantonese starting with “c” but I can take care of that on the data side.
飤 starts with “p” simplified shows blank, OK
I will fix all these problems on the data side don’t worry about them.
踫 r problem
宁 z
軚 c problem
Case 㙜 old traditional with more than 1 character bombs
Under Old Traditional, I sometimes have more than one character.
I think you should just show all the Old traditional I have, maybe 0 to 5 old tratitional.
Case 臺 1-n simplified OK
The simplified form of 臺 is 台 which, since the input is 臺, it is specified uniquely even though 台 has 1-n relation. 臺 is OK
Case 台 臺 1-n simplified
only gets first one.
台 has a 1-4 relation to traditional characters. When it is derived from 台, I just have 台 in the simplified column and I show the etymology of 台. If it is derived from something else such as 臺 I have a 台1, 台2, or 台3 in the simplified column. I will open a separate bug for this.
Case 綎 /⿰纟廷 problem
My simplified characters come from the 2013 government standard with 8105 characters.
For some reason they have selected some bizarre characters for which Unicode currently has no simplified character or code point. These characters are specified like “⿰纟廷”
I currently preface these characters with a “/” so I can easily search for them.
I think the Simplified display should show “⿰纟廷” when I input the traditional character which does exist 綎
Case 臤 /t S Special problem, I will think about this.
Added automated test cases. Will also add these to wiki documentation.