Pinned Repositories
keyword_extraction
利用Python实现中文文本关键词抽取,分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法。
Speech-Technology-Basics
Contains talks, presentations about different aspects of speech technology and research
zero_nlp
中文nlp应用(数据、模型、训练、推理)
transformers_tasks
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
MuCGEC
MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"
document-level-classification
超长文本分类(大于1000字);文档级/篇章级文本分类;主要是解决长距离依赖问题
text_based_depression
Source code for the paper "Text-based Depression Detection: What Triggers An Alert"
pycorrector
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。
NLP-Loss-Pytorch
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al
prompt_text_classification
基于prompt的中文文本分类。
chengturbo's Repositories
chengturbo/keyword_extraction
利用Python实现中文文本关键词抽取,分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法。
chengturbo/Speech-Technology-Basics
Contains talks, presentations about different aspects of speech technology and research
chengturbo/zero_nlp
中文nlp应用(数据、模型、训练、推理)