jiejie1993

jiejie1993's Stars

unslothai/unsloth
Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Language:Python20.9k 136 1.2k1.5k
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）
Language:HTML13.2k 112 241.5k
nlpxucan/WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Language:Python9.3k 111 191725
arcee-ai/mergekit
Tools for merging pretrained large language models.
Language:Python5.1k 55 335478
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Language:Python5.1k 23 1.6k444
modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Language:Python3.5k 19 220197
LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
3k 74 38664
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Language:Python2.2k 47 140163
315386775/DeepLearing-Interview-Awesome-2024
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目
1.9k 29 1186
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
Language:Python1.4k 43 95140
react-financial/react-financial-charts
Charts dedicated to finance.
Language:TypeScript1.3k 41 143227
WangRongsheng/CareGPT
🌞 CareGPT (关怀GPT)是一个医疗大语言模型，同时它集合了数十个公开可用的医疗微调数据集和开放可用的医疗大语言模型，包含LLM的训练、测评、部署等以促进医疗LLM快速发展。Medical LLM, Open Source Driven for a Healthy Future.
Language:Python823 10 20110
IEIT-Yuan/Yuan-2.0
Yuan 2.0 Large Language Model
Language:Python683 5 9386
HIT-SCIR/Chinese-Mixtral-8x7B
中文Mixtral-8x7B（Chinese-Mixtral-8x7B）
Language:Python645 15 3032
SciPhi-AI/synthesizer
A multi-purpose LLM framework for RAG and data creation.
Language:Python620 13 1152
chaoswork/sft_datasets
开源SFT数据集整理,随时补充
474 1 239
databonsai/databonsai
clean & curate your data with LLMs.
Language:Python466 2 223
bigcode-project/bigcode-dataset
Language:Jupyter Notebook372 9 3962
liucongg/ChatGPTBook
《ChatGPT原理与实战：大型语言模型的算法、技术和私有化》
Language:Python345 12 1166
sangmichaelxie/doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
Language:HTML313 5 3033
FlagOpen/FlagData
Language:Python291 4 1635
qiaoliangxiang/cfa
FRM & CFA study notes
249 10 172
p-lambda/dsir
DSIR large-scale data selection framework for language model training
Language:Python242 21 819
adlnlp/FinLLMs
This repository contains related work, benchmarks and datasets for the paper "Large Language Models in Finance (FinLLMs)", currently under review.
180 5 133
SciPhi-AI/library-of-phi
177 5 121
SUFE-AIFLM-Lab/FinEval
FinEval是一个中文金融领域高质量多项选择与文本问答题的集合。
Language:Python167 3 511
zhenlohuang/awesome-chinese-llm
Awesome Chinese LLM: A curated list of Chinese Large Language Model 中文大语言模型数据集和模型资料汇总
138 3 012
yanqiangmiffy/how-to-train-tokenizer
怎么训练一个LLM分词器
Language:Python137 6 329
xv44586/Chinese-instruction-datasets
中文 Instruction tuning datasets
124 2 06
qianniucity/llm_notebooks
AI 应用示例合集
Language:Jupyter Notebook81 3 113