ZXXSG's Stars
SmirkCao/Lihang
Statistical learning methods, 统计学习方法(第2版)[李航] [笔记, 代码, notebook, 参考文献, Errata, lihang]
hktxt/Learn-Statistical-Learning-Method
Implementation of Statistical Learning Method, Second Edition.《统计学习方法》第二版,算法实现。
RUC-GSAI/Yulan-GARDEN
Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"
LLMBook-zh/LLMBook-zh.github.io
《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣
npubird/KnowledgeGraphCourse
东南大学《知识图谱》研究生课程
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
chaoswork/sft_datasets
开源SFT数据集整理,随时补充
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
chakki-works/seqeval
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
ChatGPTNextWeb/ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
MuQiuJun-AI/bert4pytorch
超轻量级bert的pytorch版本,大量中文注释,容易修改结构,持续更新
apple/corenet
CoreNet: A library for training deep neural networks
airaria/TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
datawhalechina/hugging-llm
HuggingLLM, Hugging Future.
AimeeLee77/keyword_extraction
利用Python实现中文文本关键词抽取,分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法。
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
km1994/AwesomeNLP
此项目完成了关于 NLP-Beginner:自然语言处理入门练习 的所有任务(文本分类、信息抽取、知识图谱、机器翻译、问答系统、文本生成、Text-to-SQL、文本纠错、文本挖掘、知识蒸馏、模型加速、OCR、TTS、Prompt、embedding等),所有代码都经过测试,可以正常运行。
wmathor/nlp-tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
crownpku/Awesome-Chinese-NLP
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
X-PLUG/ChatPLUG
A Chinese Open-Domain Dialogue System
nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
double22a/speech_dataset
The dataset of Speech Recognition
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
ayaka14732/bert-tokenizer-cantonese
BERT Tokenizer with vocabulary tailored for Cantonese
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
thu-coai/COLDataset
The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection
HIT-SCIR/ltp
Language Technology Platform
iflytek/HFL-Anthology
Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)