human-feedback

There are 15 repositories under human-feedback topic.

  • lucidrains/PaLM-rlhf-pytorch

    Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

    Language:Python7.6k14246665
  • opendilab/awesome-RLHF

    A curated list of reinforcement learning with human feedback resources (continually updated)

  • conceptofmind/LaMDA-rlhf-pytorch

    Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

    Language:Python45722776
  • wxjiao/ParroT

    The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

    Language:Python16621023
  • xrsrke/instructGOOSE

    Implementation of Reinforcement Learning from Human Feedback (RLHF)

    Language:Jupyter Notebook1644520
  • huggingface/data-is-better-together

    Let's build better datasets, together!

    Language:Jupyter Notebook1625526
  • trubrics-sdk

    trubrics/trubrics-sdk

    Product analytics for AI Assistants

    Language:Python12632324
  • yk7333/d3po

    [CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

    Language:Python126789
  • PKU-Alignment/beavertails

    BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

    Language:Makefile85563
  • HannahKirk/prism-alignment

    The Prism Alignment Project

    Language:Jupyter Notebook18211
  • gao-g/prelude

    Aligning LLM Agents by Learning Latent Preference from User Edits

    Language:Python12
  • AlaaLab/pathologist-in-the-loop

    [ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

    Language:Python11201
  • victor-iyi/rlhf-trl

    Reinforcement Learning from Human Feedback with 🤗 TRL

    Language:Python7300
  • ZiyiZhang27/tdpo

    [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

    Language:Python4200
  • 01Kevin01/awesome-RLHF-Turkish

    A curated list of reinforcement learning with human feedback resources[awesome-RLHF-Turkish] (continually updated)