trl

There are 13 repositories under trl topic.

  • jasonvanf/llama-trl

    LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

    Language:Python2282623
  • argilla-io/notus

    Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

    Language:Python1706514
  • sugarandgugu/Simple-Trl-Training

    基于DPO算法微调语言大模型,简单好上手。

    Language:Python45112
  • RobinSmits/Dutch-LLMs

    Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

    Language:Jupyter Notebook33310
  • ssbuild/llm_rlhf

    realize the reinforcement learning training for gpt2 llama bloom and so on llm model

    Language:Python26172
  • LegendLeoChen/llm-finetune

    使用trl、peft、transformers等库,实现对huggingface上模型的微调。

    Language:Python7101
  • rasyosef/phi-2-sft-and-dpo

    Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

    Language:Jupyter Notebook2100
  • SharathHebbar/sft_mathgpt2

    Supervised Fine tuning using TRL library

    Language:Jupyter Notebook210
  • pberlandier/irl-to-bal

    ODM: TRL to BAL rules automated translation

    Language:Java1100
  • rasyosef/phi-1_5-instruct

    Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

  • SharathHebbar/dpo_chatgpt2

    Direct Preference Optimization of ChatGPT2 using TRL Library

    Language:Jupyter Notebook110
  • WCoetser/Trl.TermDataRepresentation

    The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.

    Language:C#1100
  • SofiaKhutsieva/LLM_experiments

    Эксперименты с LLM (инференс, rag, дообучение)

    Language:Jupyter Notebook0100