yuzicx

yuzicx's Stars

mobvoi/seq-monkey-data
1123
jdl716/A-digital-rebirth-of-classical-calligraphy
Language:Python1
IDEA-CCNL/Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。
Language:Python4.1k379
nonwill/GoldenDict-OCR
GoldenDict++：内置大量的官方版本问题的修正；先期添加了一个简单的插件机制，并基于该机制接入了多个 OCR 划词和音频播放引擎；后期在增强易用性的基础上为提高查询效率、减少运行时 CPU 及内存占用、降低代码维护难度，完全重构了所有的实现；将来的目标是将功能扩展和词典格式处理抽象为完整的插件实现，以进一步增强应用的扩展性和可维护性。
1554
fastnlp/fastHan
fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具，像spacy一样调用方便。
Language:Python75388
Jihuai-wpy/bert-ancient-chinese
322
FudanNLP/nlp-beginner
NLP上手教程
5.9k1.3k
CIRCSE/LT4HALA
<u><a href="https://circse.github.io/LT4HALA/" style="color: white">Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA)</a></u>
Language:Python3314
iflytek/HFL-Anthology
Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)
Language:Markdown36841
TraditionalChinese/TW-ABCN
網絡上臺灣各種正字表均只能找到常用國字和次常用國字的文字版，即甲表和乙表，無罕用國字丙表，瑞據「異體字字典」附錄之正字表進行整理成文字版，以便參考，「異體字字典」正式六版附錄中說共計29921字，實則整理出來共計29923字，未知哪個統計數據出錯，期網友能找出錯誤加以改正。
6
inception-project/inception
INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
Language:Java605156
frederick-wang/tongjiazi-evaluation
CCL 2023 古汉语通假字语料库的构建及应用研究：通假字评测实验代码及 Baseline
Language:Python3
frederick-wang/tongjiazi-resources
CCL 2023 古汉语通假字语料库的构建及应用研究：通假字资源库
12
int2str/jssyntaxtree
Dynamic JavaScript version of phpSyntaxTree - a tool to draw syntax trees from labelled bracket notation.
Language:JavaScript8417
zhlint-project/zhlint
A linting tool for Chinese language.
Language:TypeScript96823
Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
Language:Python3.4k990
liuyug/mdict-utils
MDict pack/unpack/list/info tool
Language:Python31860
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Language:Python25.3k3.2k
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Language:Python5k481
yohasebe/rsyntaxtree
Syntax tree generator for linguistic research
Language:Ruby10321
HCIILAB/TKH_MTH_Datasets_Release
The Tripitaka Koreana in Han (TKH) Dataset and the Multiple Tripitaka in Han (MTH) Dataset for the research of Chinese character detection and recognition in historical documents.
636
HCIILAB/MTHv2_Datasets_Release
562
zejunwang1/bert_text_classification
基于 BERT 模型的中文文本分类工具
Language:Python6415
yeungchenwa/Recommendations-Diffusion-Text-Image
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.
2247
yeungchenwa/FontDiffuser
[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
Language:Python32431
KoichiYasuoka/UD-Kanbun
Tokenizer POS-tagger and Dependency-parser for Classical Chinese
Language:Python647
garychowcmu/daizhigev20
殆知阁古代文献
1.3k451
Xunzi-LLM-of-Chinese-classics/ancientNER_2024
2
Ucas-HaoranWei/Vary
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Language:Python1.8k146
Xunzi-LLM-of-Chinese-classics/XunziALLM
Language:Python27321