Dixin/Etymology

Test Character Cases

Closed this issue · 1 comments

Special cases of characters in the Etymology table

Case 𪛖 extension B character not in Etymology table
Input works OK
!Crashes on search

Case 𠀀 Extension B character in Etymology table
Found and displayed correct data on search

Case 㐦 Extension A character not in Etymology table
!Crashes on search

Case 䜩 Simplified character in Etymology table - OK

Case 讌 Traditional character in Etymology table - OK

Case 孃 Old Traditional OK

Case ㇏ Some “Chinese characters” come from Unicode code ranges other than Unihan, Extension A and Extension B, for example “strokes”
I have included a list of Unicode ranges which should be covered
CJK Strokes: (U+31C0 to U+31EF)
Not detected as legitimate input character.

Case Simplified column starting with “p”
I see you took care of the references to characters in the Simplified column starting with “p”
“p” means “part of a character”
I have other special characters such as Cantonese starting with “c” but I can take care of that on the data side.
飤 starts with “p” simplified shows blank, OK
I will fix all these problems on the data side don’t worry about them.
踫 r problem
宁 z
軚 c problem

Case 㙜 old traditional with more than 1 character bombs
Under Old Traditional, I sometimes have more than one character.
I think you should just show all the Old traditional I have, maybe 0 to 5 old tratitional.

Case 臺 1-n simplified OK
The simplified form of 臺 is 台 which, since the input is 臺, it is specified uniquely even though 台 has 1-n relation. 臺 is OK

Case 台 臺 1-n simplified
only gets first one.
台 has a 1-4 relation to traditional characters. When it is derived from 台, I just have 台 in the simplified column and I show the etymology of 台. If it is derived from something else such as 臺 I have a 台1, 台2, or 台3 in the simplified column. I will open a separate bug for this.

Case 綎 /⿰纟廷 problem
My simplified characters come from the 2013 government standard with 8105 characters.
For some reason they have selected some bizarre characters for which Unicode currently has no simplified character or code point. These characters are specified like “⿰纟廷”
I currently preface these characters with a “/” so I can easily search for them.
I think the Simplified display should show “⿰纟廷” when I input the traditional character which does exist 綎

Case 臤 /t S Special problem, I will think about this.

Dixin commented

Added automated test cases. Will also add these to wiki documentation.