A collection of free Chinese Mandarin dictionaries, for use with dictionary software such as Goldendict.
Name | Notes | Todo | |
---|---|---|---|
1. | Chinese Word Frequencies | based on word corpora and with HSK levels | ✓ |
2. | Make Me a Hanzi | Animations and Descriptions | ✓ |
3. | Idioms | ?from academia / BCC idiom dictionary | ? |
4. | CC-Cedict | with enlarged characters | ✓ |
5. | Handedict | with English machine-translations from German (for use with words not found in cc-cedict) | ✓ |
6. | Pinyin-to-Chinese / dictionary | with Zhuyin, Pinyin and IPA, English "sounds like" (use FSI/wiki?), add GPL audio | To complete |
7. | Unihan character dictionary (字典) | ✓ | |
8. | Phrase dictionaries | tatoeba / CUV Bible | ✓ |
9. | Idioms - W Scarborough | To convert | |
10. | Hanziyuan image library | To download | |
11. | Taiwan Ministry of Education Dictionary (moedict) 教育部國語辭典 - 重編國語辭典 修訂本 | ?upload other formats | |
12. | BCC 英汉词典 - BCC English-Chinese Wordlist | With spelling corrections | ✓ |
13. | XDICT 英汉词典 English-Chinese dictionary | ✓ | |
14. | Unihan Radical Dictionary | ✓ | |
15. | Guoxuedashi (国学大师) Character Dictionary | ✓ | |
16. | Chinese Lexicon | ✓ | |
17. | CJKVI Decomposition dictionary | ||
18. | Adso Chinese English | ✓ | |
19. | Starling etymology | ||
20. | Sinica etymology? | ||
21. | Kanjinetworks - Etymological Dictionary of Han/Chinese Characters | ✓ | |
22. | LDC Chinese-English Wordlist | ✓ | |
23. | Guoxuedashi (国学大师) Idiom Dictionaries and ?others | ||
24. | WFG dictionaries | ✓ | |
25. | Taiwan Ministry of Education Dictionary of Idioms 教育部《成語典》 | ✓ | |
26. | 數字輸入法 Chinese Input Methods | ✓ | |
27. | WFG fonts | ||
28. | Tidy files | ||
29. | Update this readme |
About 說明 / 说明
...
Details
Heading | No. of entries* | Years | |
---|---|---|---|
1a | Character freq. (Books): | 9,932 | 1911-2003 |
1b | Word freq. (Books): | 76,002 | 1911-2003 |
2a | Character freq. (Movies): | 3,360 | < 2010 |
2b | Word freq. (Movies): | 69,004 | < 2010 |
3 | Word freq. (Mixed Print): | 24,669 | ~1991 |
4 | Char freq. (Usenet): | 5,083 | |
5 | Word freq. (Internet): | 50,000 | 2006 |
6 | Word freq. (Newswire): | 4,945 | 1990-2002 |
7 | HSK Levels: | 5,000 | 2010 |
8 | Pinyin ratios: | 5,000 | 2010 |
Notes:
Notes:
*English, Russian, numeral and punctuation characters removed from references [3] & [5].
Corpora [3],[4] and [5] have been re-ranked in order of frequencies, taking into account joint rankings. Where two entries have the same prevalence, they are both ranked e.g. “≥2124”, with the next entry ranked as “2126”.
Licence 許可證 / 许可证
HTML licenced under CC BY-NC 4.0 Licence
, data according to licences below: Sources of word frequency data and their licences:
Heading | Corpus reference | Corpus licence | Word list Source | Word list licence | |
---|---|---|---|---|---|
1 | ”Chinese Word Frequencies”: this dictionary | CC BY-NC 4.0 Licence | https://github.com/lxs602/Chinese-Mandarin-Dictionaries | ||
2 | Character/Word freq. (Books) | Da, Jun. 2004. Chinese text computing. http://lingua.mtsu.edu/chinese-computing | https://lingua.mtsu.edu/chinese-computing/copyright.html | Chinese Lexicon, by Peter Olson. https://github.com/peterolson/chinese-lexicon/tree/master/statistics | As for corpus |
3 | Character/Word freq. (Movies) | Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles. Plos ONE, 5(6), e10729. https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexch/overview.htm | Creative Commons Attribution Licence | Chinese Lexicon, by Peter Olson. (See above) | As for corpus |
4 | Word freq. (Mixed Print) | Graff, David, and Ke Chen. Chinese Gigaword LDC2003T09. Web Download. Philadelphia: Linguistic Data Consortium, 2003. https://catalog.ldc.upenn.edu/LDC2003T09 | LDC User Agreement for Non-Members https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | http://corpus.leeds.ac.uk/frqc/giga-zh.num | corpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode |
5 | Character freq. (Usenet) | kFrequency field in UniHan database, Unicode version: 11.0.0 | https://www.unicode.org/license.html | ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip | As for corpus |
6 | Word freq. (Internet) | Sharoff, S. (2006) Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini, (eds), WaCky! Working papers on the Web as Corpus. Gedit, Bologna, http://wackybook.sslmit.unibo.it | Creative Commons Attribution-NoDerivs 2.5 License | http://corpus.leeds.ac.uk/internet/i-zh.num | corpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode |
7 | Word freq. (News) | McEnery, A. M. and Xiao, R. Z. (2003) The Lancaster Corpus of Mandarin Chinese. European Language Resources Association / Oxford Text Archive, Paris, France / Oxford, UK. | The Lancaster Corpus of Mandarin Chinese End User License https://www.lancaster.ac.uk/fass/projects/corpus/LCMC/lcmc/lcmc_license.htm | http://corpus.leeds.ac.uk/frqc/lcmc.num | corpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode |
8 | HSK Levels | http://www.chinesetest.cn/userfiles/file/HSK/HSK-2012.xls | |||
9 | Pinyin ratios | kHanyuPinlu field in UniHan database, Unicode version: 11.0.0 | https://www.unicode.org/license.html | ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip, from: Chinese Lexicon, by Peter Olson. (See above) | As for corpus |
Notes: All references accessed 26 Dec 2020.
Notes: All references accessed 26 Dec 2020.
About 說明 / 说明
The CC-CEDICT dictionary, with enlarged Chinese characters for ease of reading, and without a small handful of obscene terms or definitions which shouldn't be there.
Licence 許可證 / 许可证
Creative Commons BY-SA 3.0
Original Files 資料來源
https://www.mdbg.net/chinese/dictionary?page=cc-cedict
About 說明 / 说明
A machine translation of HanDeDict into English (by DeepL translate). Intended to accompany cc-cedict, so terms already present in cc-cedict were omitted, along with a lot of numerical terms (e.g. definitions for 1, 2, 3... 10,000... 10,001...) and a small amount of profanity.
Licence 許可證 / 许可证
Creative Commons BY-SA 2.0
Original Files 資料來源
https://github.com/gugray/HanDeDict
Memo:
Add note re overlay prob on goldendict if used with cc-cedict
Licence 許可證 / 许可证
GNU LESSER GENERAL PUBLIC LICENSE, Version 3, 29 June 2007
Original Files 資料來源
https://github.com/skishore/makemeahanzi
Note:
Assembled from the freely available BCC corpus dictionary, with 42,784 terms.
The original file had some spelling errors, and though it has been proof-read in English, some may remain.
Contains some uncommon words and national variants, so would be a good accompaniment to other dictionaries.
资料来源 Original Files
http://bcc.blcu.edu.cn/downloads/resources/%E8%8B%B1%E6%B1%89%E8%AF%8D%E5%85%B8.zip
About 說明 / 说明
XDict, the free English to Chinese dictionary, originally developed by Fu Jianjun, with about 177,000 terms.
Licence 許可證 / 许可证
XDICT是一个freeware,大致按照GPL传播.
Original Files 資料來源
http://archive.ubuntu.com/ubuntu/pool/universe/d/dict-xdict/dict-xdict_0.1-4.2_all.deb
About 說明 / 说明
A Chinese-English dictionary, from the ADSO project by David Lancashire, https://github.com/wtanaka/adso. Source file derived from the Speaking English Dictionary, by Warren S. Goff, which also appears to include entries from the LDC wordlist.
Using the adso translation project application itself is recommended over using this particular dictionary, as it has as-you-type translation of phrases, similar to Google or Bing Translate. An online version is hosted at Popup Chinese
Licence 許可證 / 许可证
Adsotrans Attribution-NonCommercial License 1.1<
Original Files 資料來源
https://github.com/wtanaka/adso
About 說明 / 说明
A character etymology dictionary, derived from Chinese Lexicon by Peter Olsen (dong-chinese.com). Contains decomposition data, helpful images of iconographs and short definitions from CC-CEDICT. Total of 5054 terms.
Licence 許可證 / 许可证
Freely available
Original Files 資料來源
https://github.com/peterolson/chinese-lexicon
About 說明 / 说明
A Chinese-English vocabulary, from sentences submitted to tatoeba.org. Dowloaded December 2020, with 47969 phrases. “Tatoeba is a collection of sentences and translations. It's collaborative, open, free and even addictive.” (from the tatoeba website)
NB. To list all the sentences with audio, search for the term ‘audio’.
Chinese words segmented using jieba. Thanks also to "Generating Anki decks with audio from the Tatoeba Project", accessed December 2020.
Licence 許可證 / 许可证
CC BY 2.0 FR
About 說明 / 说明
...
About 說明 / 说明
A compilation of freely avaialble Chinese input method codes, as listed in this table or here.
Also:
吳語臺語字輸入法 Wu and Minnan
亞洲(日韓越泰)輸入法辞书 East Asian (JKVT)
資料來源 Original Files:
https://github.com/chinese-opendesktop/cin-tables
https://github.com/openvanilla/openvanilla/tree/master/DataTables
Licences 許可證 / 许可证
數字輸入法 Chinese Input Methods
吳語臺語字輸入法 Wu and Minnan
亞洲(日韓越泰)輸入法辞书 East Asian (JKVT)
# **Not avaialble - pending corrections**
說明 / 说明 About
串珠聖經和合本 (Concordance)
例如查"企望",就會列出所有這個字原文對應聖經和合本翻譯的字及其經節出處,和英文翻譯 (World English Bible - British English / 國王詹姆斯版本 King James Version) 。欽定版聖經於 (KJV) 1611 年出版。建議 WEB-BE,因為它更簡單。
串珠圣经和合本 (Concordance)
例如查"企望",就会列出所有这个字原文对应圣经和合本翻译的字及其经节出处,和英文翻译 (World English Bible - British English / 国王詹姆斯版本 King James Version) 。钦定版圣经于 (KJV) 1611 年出版。建议 WEB-BE,因为它更简单。
A searchable concordance for the Chinese Union Version (CUV) Bible, with an English translation from the World English Bible - British English or the King James Version.
For example, searching for "企望" (hope) will show all verses with this word, and the matching English translation.
This dictionary was made to be a resource for learning English/Chinese, as the Bible is free in both languages, and has a very large amount of Chinese-English vocabulary available. It may be particularly helpful to those already familiar with parts of the text. Studying a passage in the corresponding language should aid learning.
Additionally, for anyone interested primarily in studying God's word, though in English it has many concordances, there is perhaps only one in Chinese, for the New Testament only.
發展 / 发展 Compilation
中文分词利用pywordseg (ELMo) 系統未經校核親自的。請經文錯誤回報給開發者。
中文分词利用pywordseg (ELMo) 系统未经校核亲自的。请经文错误回报给开发者。
Chinese words have been segmented automatically, without checking in person. Please report any errors you find.
Word segmentation was with pywordseg (ELMo) (https://github.com/voidism/pywordseg), using CC-CEDICT dictionary and a Chinese word list of Bible names and places (https://github.com/guoshengkang/Bible-Word-Statistics/tree/master/output_file_tf), then indexed using word_line_concordance_app (https://github.com/lostchristmas0/word_line_concordance_application) by lostchristmas0.
The simplified and traditional versions of the CUV were segmented separately, to avoid errors converting from traditional to simplified, so there may be different mistakes in each version.
文本 Choice of Text
The CUV was chosen as it is widely used, in the public domain, and available already segmented by Strong's numbers. For similar reasons, the WEB-BE is a free, accurate and also readable English version.
The KJV is also free, but is not recommended for those without a very good use of English.
Related Texts
Another project which may be of interest is this Chinese-English comparison Bible by michaelchanwahyan, which has several free English and Chinese versions.
資料來源 Original Files:
https://ebible.org/webbe/
https://www.o-bible.com/
許可證 / 许可证 Licences
CUV Bible: Public Domain
KJV Bible: Crown copyright
World English Bible: Public Domain. "World English Bible" is a trademark of ebible.org; see https://ebible.org/web/copyright.htm
About 說明 / 说明
This dictionary was produced from the free release by the Taiwanese Ministry of Education, and first released in 2015. Total entries: 163, 085. Compiled and with HTML design by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity (with thanks to shawkynasr).
Authorisation / 授權 (from the author's webpage):
Version 版本
2015, revised 10th Oct 2020
Licence 許可證 / 许可证
Creative Commons NonCommercial 3.0 Unported Licence (No derivatives)
Original Files 資料來源
https://language.moe.gov.tw/001/Upload/Files/site_content/M0001/respub/index.html
http://fgwang.blogspot.com/2018/02/blog-post_14.html
A Taiwanese character dictionary by WFG, with; pinyin and Zhuyin, stroke order, radicals, Cangjie input, and CNS 11463 codes. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.
Original Files 資料來源
http://fgwang.blogspot.com/2020/07/blog-post_3.html
English translation (Google translate)
A 7th Century Tang dynasty dictionary ('Character Book for Seeking an Official Emolument' ) of 800 characters, for students of the imperial examination, by 顏元孫 Yan Yuansun. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.
Original Files 資料來源
http://fgwang.blogspot.com/2019/04/blog-post.html
English translation (Google translate)
A character dictionary compiled by order of the Kangxi emperor of the Qing dynasty in AD 1710, with 214 radicals forming the basis of modern radical dictionaries. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.
This is a large dictionary. One of the files has been split into three (康熙字典.mdd.zip, 康熙字典.mdd.z01, 康熙字典.mdd.z02), which must be opened together in e.g. Winzip / other, so that they can be recombined.
Original Files 資料來源
http://fgwang.blogspot.com/2018/12/blog-post_10.html
English translation (Google translate)
Licence 許可證
CC BY-SA 3.0
The 2nd Century Han character dictionary, by 許慎 Xu Shen. Compiled and HTML designed by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and also for the sake of posterity.
This is another large dictionary. One of the files has been split into three (說文解字.mdd.zip, 說文解字.mdd.z01, 說文解字.mdd.z02), which must be opened together in e.g. Winzip / other, so that they can be recombined.
Original Files 資料來源
http://fgwang.blogspot.com/2019/02/blog-post.html
English translation (Google translate)
GoldenDict https://github.com/goldendict/goldendict
Dictionary software for Linux, Windows and Mac.
WriteMDict by Zhansilu https://github.com/zhansliu/writemdict
Mdict-utils by Liuyug https://github.com/liuyug/mdict-utils