Dixin/Etymology

Main Character Etymology Table

Closed this issue · 10 comments

The following is a description of my main etymology table
I have been working on this etymology table every day for the past year, there is much new and more accurate information. I modify it with Access, because I can modify anything any time, and I will be modifying it in this fashion for the foreseeable future.

I have analyzed about 12,000 different characters. Including

  1. All common simplified characters
  2. All common traditional characters
  3. All character components
  4. Ancient characters of etymological interest.

I either need

  1. The ability to modify it in SQL as easily as I modify it in Access, or
  2. The ability to easily replace the SQL with the latest Access table.

Main Etymology Table
Appropriate information from the following 20 columns should be displayed if it is relevant to the input character.

S_Char - Simplified Character 通用规范汉字表 2013
Should be unique
Simplified character from 2013 list of common characters
list of 8105 common simplified characters
If it has multiple associated traditional characters, the character may be followed by a 1, 2, 3
Example 台 ,台1 ,台2, 台3
For rare component parts, there will be an indicator such as
Example p001, p002, p003

S_# - Simplified character number from 2013 list
通用规范汉字表號碼

S_Rule – Simplification rules
第一表 不作简化偏旁用的简化字350个,
第二表 可作简化偏旁用的简化字和简化偏旁132个
和简化偏旁14个

S_Info – Explanation of the logic behind the S_Rule.

T_Char – Equivalent
Common associated traditional character
Should be unique

T_Old – Equivalent
Older traditional forms and variants

Pinyin_1 – Main Pinyin pronunciation

Pinyin_n – Alternate Pinyin pronunciations

V_Rule – Variant Simplification Rules 第一批异体字整理表
810 rules covering 1865 characters

V_Info – Explanation of simplification rules 第一批异体字整理表解釋

C_Rule – Combination rule
第三表 应用第二表所列简化字和简化偏旁得出来的简化字 1,753个

F_Rule – Font rule 新字體-舊字體

Ety_Std – Standard Decomposition
My standard decomposition of the components of each character
Standard decomposition of apparent components and etymological components

Ety_Exp – Explanation of the components in Ety_Std

M_Original – Original meaning

M_Modern – Modern meanings

Jap_Cant – hirigana, katakana Bopomofo

Word – Compound character 辭example

Videos – Uncle Hanzi Videos (videos where I discuss information related to the input character)

Picture – Pictures related to the character (pictures which are related to the input character).

These columns contain information that probably will not be used
S_Count -
S_New – Is simplified character a new invention or a previously existing traditional character in Kangxi or is it a new invention such as 車.
S_Diff - Is simplified character different from traditional character

These columns contain deprecated information that I will not use for now, but I wish to keep.
xJ_Simple -
zR-Rule -
zR_1753 -
x_Field1 -
xRetro -
xName-type -
xBook-Img-Num -
xClass -
zEnglish -
zLearnOrder -
zRadical -
zUnicode -
zSTD -
zID -
zfreq -
z8005Char –

Dixin commented

Please follow these steps:

  1. Rename the columns following the naming guidelines.
    DO NOT use abbreviations or contractions as part of identifier names.
    https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/naming-guidelines
    https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/general-naming-conventions
    Remove abbreviations. Only use abbreviations which are "industry standard". For example, "HTML" for "Hyper Text Markup Language" is ok, but T for "traditional" is not ok. Replace above "T", "S", "z", "STD", "x", "xJ", ... abbreviations with the full word.

  2. In Access, add a primary key to your table if you haven't yet. Fix duplication if any.

  3. Share or send your Access database to me.

  4. Reply this thread, specify which columns should be displayed on the UI. Specify each column's new name with abbreviations removed in step 1, and its display name on UI.

Dixin commented

As I mentioned above, a list of display names for the Etymology columns is needed.

e.g. for the previous Etymology table, the column names/display names are:

  • [OldKai] -- older character
  • [Variants796_810] -- variant rule
  • [VariantMeaning] -- meanings of variants
  • [SimplificationRule] -- simplification rule
  • [SimplificationNewOld] -- modern invention
  • [RuleBase1753] -- compound rule
  • [RuleBaseObserved] -- applied rules
  • [Etymology] -- decomposition
  • [Meaning] -- original meaning
  • [CompoundExample] -- example in use
  • [8105xID] -- simplified character number
  • [EnglishMeanings] -- modern meanings
  • [LiuShuStandardName] -- standard name of character
  • [MeaningClass] -- class of character
  • [PictographName] -- original
  • [Pinyin1] -- main pronunciation
  • [PinyinN] -- other pronunciations

[ID] --
[Simplified] – Simplified character
[Traditional] – Traditional character
[OldTraditional] – Older traditional characters
[Pinyin] – Main pronunciation
[Index8105] -- Simplified character index number
[SimpRule] – Simplification rule
[SimpClarified] -- Simplification rule explained
[VariantRule] – Variant rule
[VariantClarified] – Variant rule clarification
[AppliedRule] – Applied rules
[FontRule] – New font rule
[Decomposition] – Character decomposition
[DecompositionClarified] – Decomposition notes
[OriginalMeaning] – Original meaning
[EnglishSenses] -- English senses
[WordExample] – Usage example
[PinyinOther] – Other pronunciations
[Videos] – Related videos
[Pictures] – Related pictures
[FrequencyOrder] – Importance by frequency
[LearnOrder] – Importance in learning
[IdealForms] – Ideal ideographs
[Classification] – Classification
[Unicode] – Traditional Character Unicode

[Pictures] -- related pictures
For instance -- for 車 I may have a picture of an ancient two wheeled Chinese cart.

Dixin commented

Your new table in access is missing the Unicode column:
image

Dixin commented

Download new SQL database: https://drive.google.com/file/d/1ncFH8GjLhFs9LQ3OHGof9jsccKUci4bq/view?usp=sharing
The database has all new data from Access. The schema is carefully designed and optimized. Please do not delete the tables and columns.

Simplified, Traditional, Old Traditional, Simplified Rule, Variant rule, Applied Rule, Font Rule should now be normalized.

It would have taken me 3 days just to change all the z's using the Access connect.

It if OK if you do not have a lot of stuff to do.

I ope to get the Decomposition normalized, but it will take me a few weeks and I want to use Access.

After that I should not have huge amounts of data to update.

Here is the latest.

Ety04272018.zip

Dixin commented

Do not spend time on "z"s. These can be replaced with NULL by a single SQL statement.

Do fix "w", "x", "y".

Dixin commented

There are still a few issues in the Traditional column.

  • I removed the white spaces in the column.
  • There is also a traditional character 杯盃, I have updated it as .

Now, this column looks perfect. Please use the attached database.
Ety04272018.zip

「之」字所放的甲骨文是「有」字,而非「之」