yuzicx's Stars
mobvoi/seq-monkey-data
jdl716/A-digital-rebirth-of-classical-calligraphy
IDEA-CCNL/Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
nonwill/GoldenDict-OCR
GoldenDict++:内置大量的官方版本问题的修正;先期添加了一个简单的插件机制,并基于该机制接入了多个 OCR 划词 和 音频播放 引擎;后期在增强易用性的基础上为提高查询效率、减少运行时 CPU 及 内存 占用、降低代码维护难度,完全重构了所有的实现;将来的目标是将功能扩展和词典格式处理抽象为完整的插件实现,以进一步增强应用的扩展性和可维护性。
fastnlp/fastHan
fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具,像spacy一样调用方便。
Jihuai-wpy/bert-ancient-chinese
FudanNLP/nlp-beginner
NLP上手教程
CIRCSE/LT4HALA
<u><a href="https://circse.github.io/LT4HALA/" style="color: white">Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA)</a></u>
iflytek/HFL-Anthology
Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)
TraditionalChinese/TW-ABCN
網絡上臺灣各種正字表均只能找到常用國字和次常用國字的文字版,即甲表和乙表,無罕用國字丙表,瑞據「異體字字典」附錄之正字表進行整理成文字版,以便參考,「異體字字典」正式六版附錄中說共計29921字,實則整理出來共計29923字,未知哪個統計數據出錯,期網友能找出錯誤加以改正。
inception-project/inception
INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
frederick-wang/tongjiazi-evaluation
CCL 2023 古汉语通假字语料库的构建及应用研究:通假字评测实验代码及 Baseline
frederick-wang/tongjiazi-resources
CCL 2023 古汉语通假字语料库的构建及应用研究:通假字资源库
int2str/jssyntaxtree
Dynamic JavaScript version of phpSyntaxTree - a tool to draw syntax trees from labelled bracket notation.
zhlint-project/zhlint
A linting tool for Chinese language.
Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
liuyug/mdict-utils
MDict pack/unpack/list/info tool
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
yohasebe/rsyntaxtree
Syntax tree generator for linguistic research
HCIILAB/TKH_MTH_Datasets_Release
The Tripitaka Koreana in Han (TKH) Dataset and the Multiple Tripitaka in Han (MTH) Dataset for the research of Chinese character detection and recognition in historical documents.
HCIILAB/MTHv2_Datasets_Release
zejunwang1/bert_text_classification
基于 BERT 模型的中文文本分类工具
yeungchenwa/Recommendations-Diffusion-Text-Image
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.
yeungchenwa/FontDiffuser
[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
KoichiYasuoka/UD-Kanbun
Tokenizer POS-tagger and Dependency-parser for Classical Chinese
garychowcmu/daizhigev20
殆知阁古代文献
Xunzi-LLM-of-Chinese-classics/ancientNER_2024
Ucas-HaoranWei/Vary
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Xunzi-LLM-of-Chinese-classics/XunziALLM