chengturbo

Pinned Repositories

keyword_extraction
利用Python实现中文文本关键词抽取，分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法。
Language:Python00
Speech-Technology-Basics
Contains talks, presentations about different aspects of speech technology and research
00
zero_nlp
中文nlp应用(数据、模型、训练、推理)
Language:Jupyter Notebook00
transformers_tasks
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
Language:Jupyter Notebook2.1k 17 89377
MuCGEC
MuCGEC中文纠错数据集及文本纠错SOTA模型开源；Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"
Language:Python487 6 5763
document-level-classification
超长文本分类（大于1000字）；文档级/篇章级文本分类；主要是解决长距离依赖问题
Language:Python116 1 629
text_based_depression
Source code for the paper "Text-based Depression Detection: What Triggers An Alert"
Language:Python44 3 510
pycorrector
pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，开箱即用。
Language:Python5.5k 83 4671.1k
NLP-Loss-Pytorch
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al
Language:Python249 4 544
prompt_text_classification
基于prompt的中文文本分类。
Language:Python53 1 75

chengturbo's Repositories

chengturbo/keyword_extraction
利用Python实现中文文本关键词抽取，分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法。
Language:Python00
chengturbo/Speech-Technology-Basics
Contains talks, presentations about different aspects of speech technology and research
00
chengturbo/zero_nlp
中文nlp应用(数据、模型、训练、推理)
Language:Jupyter Notebook00