human-feedback

There are 18 repositories under human-feedback topic.

lucidrains/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Language:Python7.7k 143 48672
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
3.6k 61 4221
conceptofmind/LaMDA-rlhf-pytorch
Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
Language:Python469 23 774
huggingface/data-is-better-together
Let's build better datasets, together!
Language:Jupyter Notebook243 7 729
yk7333/d3po
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
Language:Python184 8 1618
wxjiao/ParroT
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.
Language:Python172 3 1023
xrsrke/instructGOOSE
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Language:Jupyter Notebook171 5 521
trubrics/trubrics-sdk
Product analytics for AI Assistants
Language:Python138 6 2326
PKU-Alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Language:Makefile119 6 75
HannahKirk/prism-alignment
The Prism Alignment Project
Language:Jupyter Notebook61 3 12
davidberenstein1957/dataset-viber
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
Language:Python44 1 2712
ZhenbangDu/Reliable_AD
[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback
Language:Python39 4 50
ZiyiZhang27/tdpo
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
Language:Python33 2 00
gao-g/prelude
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
Language:Python31 3 20
AlaaLab/pathologist-in-the-loop
[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"
Language:Python17 3 01
victor-iyi/rlhf-trl
Reinforcement Learning from Human Feedback with 🤗 TRL
Language:Python9 3 01
wang8740/MAP
Documentation at
Language:Python2 1 0
JacqueWill/SEO_HIF_JS
Search Engine Optimization using Human Implicit Feedback
Language:JavaScript1

human-feedback

lucidrains/PaLM-rlhf-pytorch

opendilab/awesome-RLHF

conceptofmind/LaMDA-rlhf-pytorch

huggingface/data-is-better-together

yk7333/d3po

wxjiao/ParroT

xrsrke/instructGOOSE

trubrics/trubrics-sdk

PKU-Alignment/beavertails

HannahKirk/prism-alignment

davidberenstein1957/dataset-viber

ZhenbangDu/Reliable_AD

ZiyiZhang27/tdpo

gao-g/prelude

AlaaLab/pathologist-in-the-loop

victor-iyi/rlhf-trl

wang8740/MAP

JacqueWill/SEO_HIF_JS