suolyer

suolyer's Stars

xai-org/grok-1
Grok open release
Language:Python49.7k 591 2148.3k
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。
17k 210 261.6k
eosphoros-ai/DB-GPT
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Language:Python14.1k 116 1.2k1.9k
wgwang/awesome-LLMs-In-China
**大模型
5.7k 107 27473
modelscope/data-juicer
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！
Language:Python3.2k 19 210191
Zjh-819/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
2.7k 50 3175
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python2.7k 22 189217
thunlp/UltraChat
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Language:Python2.3k 40 30117
sysuexam/SYSU-Exam
收集整理SYSU期末考试卷子、资料
1.8k 41 18364
google-research/deduplicate-text-datasets
Language:Rust1.1k 13 42112
Xnhyacinth/Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
1.1k 47 841
hankinghu/literature-books
书籍txt
991 7 0352
facebookresearch/cc_net
Tools to download and cleanup Common Crawl data
Language:Python977 23 44143
SciPhi-AI/synthesizer
A multi-purpose LLM framework for RAG and data creation.
Language:Python619 13 1153
jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
Language:Python600 5 2063
opendatalab/WanJuan1.0
万卷1.0多模态语料
549 9 2828
VikParuchuri/textbook_quality
Generate textbook-quality synthetic LLM pretraining data
Language:Python493 9 649
chaoswork/sft_datasets
开源SFT数据集整理,随时补充
467 1 238
chaoyi-wu/Finetune_LLAMA
简单易懂的LLaMA微调指南。
Language:Python381 1 1034
LLaMafia/llamafia.github
Language:Python317 21 216
Strivin0311/long-llms-learning
A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks
Language:Jupyter Notebook254 8 214
tjunlp-lab/M3KE
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
99 4 97
OpenJarvisAI/TianMu
TianMu: A modern AI tool with multi-platform support, markdown support, multimodal, continuous conversation, and customizable commands. 一个APP支持文心一言、通义千问、LLaMa、ChatGPT等，开源的大模型客户端！
84 4 46
beichao1314/Open-Llama
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
Language:Python63 1 0112
kyegomez/FlashAttention20Triton
Triton implementation of Flash Attention2.0
Language:Python22 2 03
JackHCC/Arxiv-NLP-Reporter
每日自动获取Arxiv上NLP相关最新论文【Arxiv Natural Language Processing Paper Automatic Crawl Daily】
Language:Python17 1 03
UnstoppableCurry/High-quality-Chinese-Q-A-dataset
最大开源中文问答数据集 ,助力中文LLM.The largest open-source Chinese Q&A dataset, supporting Chinese LLM
Language:Python8 2 01
lovit/text-dedup
Python package for memory-friendly text de-duplication
Language:Python6 2 0
robotcator/flash-attention
Fast and memory-efficient exact attention
Language:C++6 1 01
lessw2020/triton_flashv2_alibi
working repo for Triton based Flash2 supporting alibi pos embeddings
Language:Python1 1 0