trl

There are 13 repositories under trl topic.

jasonvanf/llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
Language:Python228 2 623
argilla-io/notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
Language:Python170 6 514
sugarandgugu/Simple-Trl-Training
基于DPO算法微调语言大模型，简单好上手。
Language:Python45 1 12
RobinSmits/Dutch-LLMs
Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.
Language:Jupyter Notebook33 3 10
ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
Language:Python26 1 72
LegendLeoChen/llm-finetune
使用trl、peft、transformers等库，实现对huggingface上模型的微调。
Language:Python7 1 01
rasyosef/phi-2-sft-and-dpo
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
Language:Jupyter Notebook2 1 00
SharathHebbar/sft_mathgpt2
Supervised Fine tuning using TRL library
Language:Jupyter Notebook2 1 0
pberlandier/irl-to-bal
ODM: TRL to BAL rules automated translation
Language:Java1 1 00
rasyosef/phi-1_5-instruct
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
1 1 00
SharathHebbar/dpo_chatgpt2
Direct Preference Optimization of ChatGPT2 using TRL Library
Language:Jupyter Notebook1 1 0
WCoetser/Trl.TermDataRepresentation
The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.
Language:C#1 1 00
SofiaKhutsieva/LLM_experiments
Эксперименты с LLM (инференс, rag, дообучение)
Language:Jupyter Notebook0 1 00

trl

jasonvanf/llama-trl

argilla-io/notus

sugarandgugu/Simple-Trl-Training

RobinSmits/Dutch-LLMs

ssbuild/llm_rlhf

LegendLeoChen/llm-finetune

rasyosef/phi-2-sft-and-dpo

SharathHebbar/sft_mathgpt2

pberlandier/irl-to-bal

rasyosef/phi-1_5-instruct

SharathHebbar/dpo_chatgpt2

WCoetser/Trl.TermDataRepresentation

SofiaKhutsieva/LLM_experiments