trl
There are 13 repositories under trl topic.
jasonvanf/llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
argilla-io/notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
sugarandgugu/Simple-Trl-Training
基于DPO算法微调语言大模型,简单好上手。
RobinSmits/Dutch-LLMs
Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.
ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
LegendLeoChen/llm-finetune
使用trl、peft、transformers等库,实现对huggingface上模型的微调。
rasyosef/phi-2-sft-and-dpo
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
SharathHebbar/sft_mathgpt2
Supervised Fine tuning using TRL library
pberlandier/irl-to-bal
ODM: TRL to BAL rules automated translation
rasyosef/phi-1_5-instruct
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
SharathHebbar/dpo_chatgpt2
Direct Preference Optimization of ChatGPT2 using TRL Library
WCoetser/Trl.TermDataRepresentation
The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.
SofiaKhutsieva/LLM_experiments
Эксперименты с LLM (инференс, rag, дообучение)