/Chinese-Mandarin-Dictionaries

中文词典 / 中文詞典。

Primary LanguageHTML

Chinese Mandarin Dictionaries
中文词典 / 中文詞典

Readme: Work in progress

About 說明 / 说明

A collection of free Chinese Mandarin dictionaries, for use with dictionary software such as Goldendict.

Todo 接下來要做

NameNotesTodo
1. Chinese Word Frequenciesbased on word corpora and with HSK levels
2. Make Me a Hanzi Animations and Descriptions   ✓
3. Idioms?from academia / BCC idiom dictionary ?
4. CC-Cedictwith enlarged characters    ✓
5. Handedictwith English machine-translations from German (for use with words not found in cc-cedict)    ✓
6. Pinyin-to-Chinese / dictionary with Zhuyin, Pinyin and IPA, English "sounds like" (use FSI/wiki?), add GPL audio To complete
7. Unihan character dictionary (字典)    ✓
8. Phrase dictionariestatoeba / CUV Bible   ✓
9. Idioms - W ScarboroughTo convert
10. Hanziyuan image libraryTo download
11. Taiwan Ministry of Education Dictionary (moedict) 教育部國語辭典 - 重編國語辭典 修訂本 ?upload other formats
12. BCC 英汉词典 - BCC English-Chinese WordlistWith spelling corrections   ✓
13. XDICT 英汉词典 English-Chinese dictionary    ✓
14. Unihan Radical Dictionary    ✓
15. Guoxuedashi (国学大师) Character Dictionary    ✓
16. Chinese Lexicon    ✓
17. CJKVI Decomposition dictionary 
18. Adso Chinese English    ✓
19. Starling etymology 
20. Sinica etymology? 
21. Kanjinetworks - Etymological Dictionary of Han/Chinese Characters    ✓
22. LDC Chinese-English Wordlist    ✓
23. Guoxuedashi (国学大师) Idiom Dictionaries and ?others 
24. WFG dictionaries    ✓
25. Taiwan Ministry of Education Dictionary of Idioms 教育部《成語典》   ✓
26. 數字輸入法 Chinese Input Methods   ✓
27. WFG fonts  
28. Tidy files 
29. Update this readme 


Chinese Word Frequencies 词频分析 / 詞頻分析

About 說明 / 说明
...

Details

 HeadingNo. of entries*Years
1aCharacter freq. (Books):9,9321911-2003
1bWord freq. (Books):76,0021911-2003
2aCharacter freq. (Movies):3,360< 2010
2bWord freq. (Movies):69,004< 2010
3Word freq. (Mixed Print):24,669~1991
4Char freq. (Usenet):5,083 
5Word freq. (Internet):50,0002006
6Word freq. (Newswire):4,9451990-2002
7HSK Levels:5,0002010
8Pinyin ratios:5,0002010


Notes:
  • *English, Russian, numeral and punctuation characters removed from references [3] & [5].
  • Corpus [3],[4] and [5] reranked taking into account joint rankings. Where two entries have the same prevalence, they are both ranked e.g. “≥2,124”, with the next entry ranked as “2,126”
  • Notes:
    *English, Russian, numeral and punctuation characters removed from references [3] & [5].
    Corpora [3],[4] and [5] have been re-ranked in order of frequencies, taking into account joint rankings. Where two entries have the same prevalence, they are both ranked e.g. “≥2124”, with the next entry ranked as “2126”.


    Licence 許可證 / 许可证
    HTML licenced under CC BY-NC 4.0 Licence

    , data according to licences below: Sources of word frequency data and their licences:

     HeadingCorpus referenceCorpus licenceWord list SourceWord list licence
    1”Chinese Word Frequencies”: this dictionary CC BY-NC 4.0 Licencehttps://github.com/lxs602/Chinese-Mandarin-Dictionaries 
    2Character/Word freq. (Books)Da, Jun. 2004. Chinese text computing. http://lingua.mtsu.edu/chinese-computing https://lingua.mtsu.edu/chinese-computing/copyright.htmlChinese Lexicon, by Peter Olson. https://github.com/peterolson/chinese-lexicon/tree/master/statisticsAs for corpus
    3Character/Word freq. (Movies)Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles. Plos ONE, 5(6), e10729. https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexch/overview.htm Creative Commons Attribution LicenceChinese Lexicon, by Peter Olson. (See above)As for corpus
    4Word freq. (Mixed Print)Graff, David, and Ke Chen. Chinese Gigaword LDC2003T09. Web Download. Philadelphia: Linguistic Data Consortium, 2003. https://catalog.ldc.upenn.edu/LDC2003T09LDC User Agreement for Non-Members https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdfhttp://corpus.leeds.ac.uk/frqc/giga-zh.numcorpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode
    5Character freq. (Usenet)kFrequency field in UniHan database, Unicode version: 11.0.0https://www.unicode.org/license.htmlftp://ftp.unicode.org/Public/UNIDATA/Unihan.zipAs for corpus
    6Word freq. (Internet)Sharoff, S. (2006) Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini, (eds), WaCky! Working papers on the Web as Corpus. Gedit, Bologna, http://wackybook.sslmit.unibo.itCreative Commons Attribution-NoDerivs 2.5 Licensehttp://corpus.leeds.ac.uk/internet/i-zh.numcorpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode
    7Word freq. (News)McEnery, A. M. and Xiao, R. Z. (2003) The Lancaster Corpus of Mandarin Chinese. European Language Resources Association / Oxford Text Archive, Paris, France / Oxford, UK.The Lancaster Corpus of Mandarin Chinese End User License https://www.lancaster.ac.uk/fass/projects/corpus/LCMC/lcmc/lcmc_license.htmhttp://corpus.leeds.ac.uk/frqc/lcmc.numcorpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode
    8HSK Levels  http://www.chinesetest.cn/userfiles/file/HSK/HSK-2012.xls 
    9Pinyin ratioskHanyuPinlu field in UniHan database, Unicode version: 11.0.0https://www.unicode.org/license.htmlftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip, from: Chinese Lexicon, by Peter Olson. (See above)As for corpus


    Notes: All references accessed 26 Dec 2020.

    Notes: All references accessed 26 Dec 2020.

    CC-CEDICT - with enlarged characters

    About 說明 / 说明
    The CC-CEDICT dictionary, with enlarged Chinese characters for ease of reading, and without a small handful of obscene terms or definitions which shouldn't be there.

    Licence 許可證 / 许可证
    Creative Commons BY-SA 3.0

    Original Files 資料來源
    https://www.mdbg.net/chinese/dictionary?page=cc-cedict

    HanDeDict - English translation

    About 說明 / 说明
    A machine translation of HanDeDict into English (by DeepL translate). Intended to accompany cc-cedict, so terms already present in cc-cedict were omitted, along with a lot of numerical terms (e.g. definitions for 1, 2, 3... 10,000... 10,001...) and a small amount of profanity.

    Licence 許可證 / 许可证
    Creative Commons BY-SA 2.0

    Original Files 資料來源
    https://github.com/gugray/HanDeDict

    Make Me a Hanzi

    Memo:
    Add note re overlay prob on goldendict if used with cc-cedict

    Licence 許可證 / 许可证
    GNU LESSER GENERAL PUBLIC LICENSE, Version 3, 29 June 2007


    Original Files 資料來源
    https://github.com/skishore/makemeahanzi

    BCC 英汉词典 - BCC Corpus English-Chinese Wordlist

    Note:
    Assembled from the freely available BCC corpus dictionary, with 42,784 terms.

    The original file had some spelling errors, and though it has been proof-read in English, some may remain.

    Contains some uncommon words and national variants, so would be a good accompaniment to other dictionaries.


    资料来源 Original Files
    http://bcc.blcu.edu.cn/downloads/resources/%E8%8B%B1%E6%B1%89%E8%AF%8D%E5%85%B8.zip

    XDICT 英汉词典 (English-Chinese Dictionary)

    About 說明 / 说明
    XDict, the free English to Chinese dictionary, originally developed by Fu Jianjun, with about 177,000 terms.

    Licence 許可證 / 许可证
    XDICT是一个freeware,大致按照GPL传播.

    Original Files 資料來源
    http://archive.ubuntu.com/ubuntu/pool/universe/d/dict-xdict/dict-xdict_0.1-4.2_all.deb

    Adso trans

    About 說明 / 说明
    A Chinese-English dictionary, from the ADSO project by David Lancashire, https://github.com/wtanaka/adso. Source file derived from the Speaking English Dictionary, by Warren S. Goff, which also appears to include entries from the LDC wordlist.

    Using the adso translation project application itself is recommended over using this particular dictionary, as it has as-you-type translation of phrases, similar to Google or Bing Translate. An online version is hosted at Popup Chinese

    Licence 許可證 / 许可证
    Adsotrans Attribution-NonCommercial License 1.1<

    Original Files 資料來源
    https://github.com/wtanaka/adso

    Chinese Lexicon - Etymology

    About 說明 / 说明
    A character etymology dictionary, derived from Chinese Lexicon by Peter Olsen (dong-chinese.com). Contains decomposition data, helpful images of iconographs and short definitions from CC-CEDICT. Total of 5054 terms.

    Licence 許可證 / 许可证
    Freely available

    Original Files 資料來源
    https://github.com/peterolson/chinese-lexicon

    Tatoeba Chinese-English Vocabulary

    About 說明 / 说明
    A Chinese-English vocabulary, from sentences submitted to tatoeba.org. Dowloaded December 2020, with 47969 phrases. “Tatoeba is a collection of sentences and translations. It's collaborative, open, free and even addictive.” (from the tatoeba website)

    NB. To list all the sentences with audio, search for the term ‘audio’.

    Chinese words segmented using jieba. Thanks also to "Generating Anki decks with audio from the Tatoeba Project", accessed December 2020.

    Licence 許可證 / 许可证
    CC BY 2.0 FR


    新华字典 Xīnhuá Zìdiǎn character dictionary

    About 說明 / 说明
    ...


    數字輸入法 Chinese Input Methods

    About 說明 / 说明
    A compilation of freely avaialble Chinese input method codes, as listed in this table or here.

    Also:
    吳語臺語字輸入法 Wu and Minnan
    亞洲(日韓越泰)輸入法辞书 East Asian (JKVT)


    資料來源 Original Files:
    https://github.com/chinese-opendesktop/cin-tables
    https://github.com/openvanilla/openvanilla/tree/master/DataTables

    Licences 許可證 / 许可证
    數字輸入法 Chinese Input Methods
    吳語臺語字輸入法 Wu and Minnan
    亞洲(日韓越泰)輸入法辞书 East Asian (JKVT)




    串珠聖經和英文翻 /串珠圣经和英文翻 Chinese Bible concordance and English Translation


    # **Not avaialble - pending corrections**

    說明 / 说明 About

    串珠聖經和合本 (Concordance)
    例如查"企望",就會列出所有這個字原文對應聖經和合本翻譯的字及其經節出處,和英文翻譯 (World English Bible - British English / 國王詹姆斯版本 King James Version) 。欽定版聖經於 (KJV) 1611 年出版。建議 WEB-BE,因為它更簡單。

    串珠圣经和合本 (Concordance)
    例如查"企望",就会列出所有这个字原文对应圣经和合本翻译的字及其经节出处,和英文翻译 (World English Bible - British English / 国王詹姆斯版本 King James Version) 。钦定版圣经于 (KJV) 1611 年出版。建议 WEB-BE,因为它更简单。

    A searchable concordance for the Chinese Union Version (CUV) Bible, with an English translation from the World English Bible - British English or the King James Version.

    For example, searching for "企望" (hope) will show all verses with this word, and the matching English translation.

    This dictionary was made to be a resource for learning English/Chinese, as the Bible is free in both languages, and has a very large amount of Chinese-English vocabulary available. It may be particularly helpful to those already familiar with parts of the text. Studying a passage in the corresponding language should aid learning.

    Additionally, for anyone interested primarily in studying God's word, though in English it has many concordances, there is perhaps only one in Chinese, for the New Testament only.

    發展 / 发展 Compilation
    中文分词利用pywordseg (ELMo) 系統未經校核親自的。請經文錯誤回報給開發者。
    中文分词利用pywordseg (ELMo) 系统未经校核亲自的。请经文错误回报给开发者。

    Chinese words have been segmented automatically, without checking in person. Please report any errors you find.

    Word segmentation was with pywordseg (ELMo) (https://github.com/voidism/pywordseg), using CC-CEDICT dictionary and a Chinese word list of Bible names and places (https://github.com/guoshengkang/Bible-Word-Statistics/tree/master/output_file_tf), then indexed using word_line_concordance_app (https://github.com/lostchristmas0/word_line_concordance_application) by lostchristmas0.

    The simplified and traditional versions of the CUV were segmented separately, to avoid errors converting from traditional to simplified, so there may be different mistakes in each version.


    文本 Choice of Text
    The CUV was chosen as it is widely used, in the public domain, and available already segmented by Strong's numbers. For similar reasons, the WEB-BE is a free, accurate and also readable English version.

    The KJV is also free, but is not recommended for those without a very good use of English.

    Related Texts
    Another project which may be of interest is this Chinese-English comparison Bible by michaelchanwahyan, which has several free English and Chinese versions.

    資料來源 Original Files:
    https://ebible.org/webbe/
    https://www.o-bible.com/


    許可證 / 许可证 Licences
    CUV Bible: Public Domain
    KJV Bible: Crown copyright
    World English Bible: Public Domain. "World English Bible" is a trademark of ebible.org; see https://ebible.org/web/copyright.htm


    教育部國語辭典 - 重編國語辭典 修訂本 Taiwan Ministry of Education Dictionary (moedict)

    About 說明 / 说明
    This dictionary was produced from the free release by the Taiwanese Ministry of Education, and first released in 2015. Total entries: 163, 085. Compiled and with HTML design by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity (with thanks to shawkynasr).

    Authorisation / 授權 (from the author's webpage):

  • 「在此遵循「創用CC-姓名標示-禁止改作 臺灣 3.0 版授權條款」將我的製作分享出來,希望這些寶貴的資料能更方便地被大家運用,也請有使用的朋友能將發現的瑕疵、錯誤反應給我知道,以利我後續的修正。」
  • "Follow the "Creative CC-Name Marking-Prohibited to Change to Taiwan Version 3.0 License Terms" to share my production. I hope that these valuable information can be used more conveniently by everyone. Please also ask friends who can find defects. Let me know any errors so that I can make subsequent corrections." (Google-translate)

  • Version 版本
    2015, revised 10th Oct 2020


    Licence 許可證 / 许可证
    Creative Commons NonCommercial 3.0 Unported Licence (No derivatives)

    Original Files 資料來源
    https://language.moe.gov.tw/001/Upload/Files/site_content/M0001/respub/index.html
    http://fgwang.blogspot.com/2018/02/blog-post_14.html

    全字庫 Quan Zi Ku

    A Taiwanese character dictionary by WFG, with; pinyin and Zhuyin, stroke order, radicals, Cangjie input, and CNS 11463 codes. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.


    Original Files 資料來源
    http://fgwang.blogspot.com/2020/07/blog-post_3.html
    English translation (Google translate)

    干祿字書 Ganlu Zishu

    A 7th Century Tang dynasty dictionary ('Character Book for Seeking an Official Emolument' ) of 800 characters, for students of the imperial examination, by 顏元孫 Yan Yuansun. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.


    Original Files 資料來源
    http://fgwang.blogspot.com/2019/04/blog-post.html
    English translation (Google translate)

    康熙字典 Kangxi Radical Dictionary

    A character dictionary compiled by order of the Kangxi emperor of the Qing dynasty in AD 1710, with 214 radicals forming the basis of modern radical dictionaries. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.

    This is a large dictionary. One of the files has been split into three (康熙字典.mdd.zip, 康熙字典.mdd.z01, 康熙字典.mdd.z02), which must be opened together in e.g. Winzip / other, so that they can be recombined.


    Original Files 資料來源
    http://fgwang.blogspot.com/2018/12/blog-post_10.html
    English translation (Google translate)

    Licence 許可證
    CC BY-SA 3.0

    說文解字 Shuowen Jiezi

    The 2nd Century Han character dictionary, by 許慎 Xu Shen. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.

    This is another large dictionary. One of the files has been split into three (說文解字.mdd.zip, 說文解字.mdd.z01, 說文解字.mdd.z02), which must be opened together in e.g. Winzip / other, so that they can be recombined.


    Original Files 資料來源
    http://fgwang.blogspot.com/2019/02/blog-post.html
    English translation (Google translate)

    Acknowledgements 鳴謝 / 鸣谢

    GoldenDict https://github.com/goldendict/goldendict
    Dictionary software for Linux, Windows and Mac.

    WriteMDict by Zhansilu https://github.com/zhansliu/writemdict

    Mdict-utils by Liuyug https://github.com/liuyug/mdict-utils