Sunburst0614's Stars
aplmikex/deduplication_mnbvc
文本去重
Mythos-Rudy/mnbvc-fasttext-classification
this repo is mnbvc text quality classification using fastText
facebookresearch/cc_net
Tools to download and cleanup Common Crawl data
facebookresearch/fastText
Library for fast text representation and classification.
6/stopwords-json
Stopwords for 50 languages in JSON format
DLR-RM/rl-baselines3-zoo
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
GanjinZero/awesome_Chinese_medical_NLP
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
wgwang/awesome-LLMs-In-China
**大模型
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
StevenJokess/d2rl
Not interactive deep reinforcement learning book with no-framework code, copied math, no discussions. Adopted at only -1 university(Shanhe University, SHU). BTW, I like this virtual university, which english abbreviation happens to be the pinyin of one part of my Chinese name(Cai "Shu"qi).
mymusise/ChatGLM-Tuning
基于ChatGLM-6B + LoRA的Fintune方案
chatchat-space/Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
zyds/transformers-code
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
Hannibal046/Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
onlyphantom/llm-python
Large Language Models (LLMs) tutorials & sample scripts, ft. langchain, openai, llamaindex, gpt, chromadb & pinecone
yangjianxin1/Firefly
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
pfoser/mapconstruction
Map Construction Algorithms
sjruan/DeepMG
Learning to Generate Maps from Trajectories (AAAI'20)
apolcyn/traclus_impl
Python implementation of Traclus algorithm, for 2-D trajectories
recommenders-team/recommenders
Best Practices on Recommendation Systems
yj8023xx/recsys-tutorial
推荐系统入门教程,包含基础知识和相应的运行实例
scutan90/DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
qingsongedu/time-series-transformers-review
A professionally curated list of awesome resources (paper, code, data, etc.) on transformers in time series.
wangyuGithub01/Machine_Learning_Resources
:fish::fish::fish: 机器学习面试复习资源
datawhalechina/fun-rec
推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/