preference-learning

There are 37 repositories under preference-learning topic.

allenai/reward-bench
RewardBench: the first evaluation tool for reward models.
Language:Python473 5 6956
tournesol-app/tournesol
Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
Language:Python338 11 71048
IAAR-Shanghai/ICSFSurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
Language:Jupyter Notebook174 3 204
qxcv/magical
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
Language:Python75 7 211
SMARTlab-Purdue/SAN-NaviSTAR
This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refer to our project website at https://sites.google.com/view/san-navistar.
Language:Python57 4 15
JanoschMenke/metis
Python-based GUI to collect Feedback of Chemist in Molecules
Language:Python46 2 312
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Language:Python41 3 02
gao-g/prelude
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
Language:Python31 3 20
CJReinforce/RIME_ICML2024
Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)
Language:Python26 3 02
typoverflow/WiseRL
PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms
Language:Python19 3 01
vicgalle/configurable-safety-tuning
Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"
Language:Python14 3 02
julilien/PLDepth
Code for "Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model" as published at CVPR 2021.
Language:Python13 5 42
aaronpmishkin/gaussian_processes
Preference Learning with Gaussian Processes and Bayesian Optimization
Language:Python7 4 00
SMARTlab-Purdue/SAN-FAPL
This repository contains the source code for our paper: "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation", accepted to IROS-2022. For more details, please refer to our project website at https://sites.google.com/view/san-fapl.
Language:Python7 2 14
98k-bot/GAN-Assisted-Preference-Based-Learning
A paper under AAAI-20 review
Language:Python6 3 01
Intelligent-Systems-Group/jpl-framework
Java framework for Preference Learning
Language:Java6 3 02
albiboni/User-RecSys
Code for the project: "Analysis of Recommendation-systems based on User Preferences".
Language:Python5 0 02
makgyver/PRL
[P]reference and [R]ule [L]earning algorithm implementation for Python 3 (https://arxiv.org/abs/1812.07895)
Language:Python5 2 01
aleksa-sukovic/iclr2024-reward-design-for-justifiable-rl
Code for the paper "Reward Design for Justifiable Sequential Decision-Making"; ICLR 2024
Language:Jupyter Notebook3 1 20
LemurPwned/bradley-terry-ui
UI for straightforward Bradley-Terry feedback loop
Language:Python2 1 00
ma921/CoExBO
(AISTATS 2024) "Looping in the Human: Collaborative and Explainable Bayesian Optimization"
Language:Jupyter Notebook2 1 00
mahmadif/able2rank
Language:R2 1 00
Rahgooy/MDFT
In this project, we design a recurrent neural network to simulate a cognitive model of decision-making called Multi Alternative Decision Field Theory (MDFT). We train this RNN to learn the parameters of MDFT.
Language:Python2 1 01
Bekyilma/Master_thesis
Constructive Preference Elicitation for Social Choice With Setwise max-margin Learning.
Language:Python1 1 00
benki-finance/finbench-arena
An open platform for benchmarking LLMs for financial services use cases. Forked from Vicuna and Chatbot Arena.
Language:Python1
FareedKhan-dev/APReL-Mountain-Car-Reinforcement-Learning
APReL: Active preference-based reward learning for human-robot interaction. Utilizing "Mountain Car" environment, learn from human preferences to reach the goal state. Applications in robotics and adaptability to other learning methods.
1 2 00
jimparr19/pypbl
Python library for preference based learning
Language:Python1 2 03
lasgroup/MaxMinLCB
Code for our paper "Bandits with Preference Feedback: A Stackelberg Game Perspective"
Language:Python1 3 01
BARUDA-AI/Awesome-Preference-Optimization
Survey of preference alignment algorithms
0 0 00
DanieleF198/ILASP-as-post-hoc-method-in-a-preference-system
Project about experiments of the use of ILASP as a post-hoc method over black-box models, in which we also study and approach technical issues like exponential time execution.
Language:Lasso0 4 00
mahmadif/able2rank_
learning-to-rank
Language:Python0 1 00
rowlandseymour/BSBT
Bayesian Spatial Bradley--Terry
Language:R0 3 01
TristanFauvel/Bayesian_test_for_preference
An analysis of preference comparisons based on the Bayes factor
Language:Jupyter Notebook0 1 00
afiliot/Preference-Learning-And-Movie-Reviews
Project on preference learning - ENSAE ParisTech
Language:Python1 01
Dev1nW/Simplified-Rating-and-Preference-RL
Simplified, modern implementation of Rating and Preference-based Reinforcement Learning.
Language:Python
shimamohammadi/LBPS-EIC
This repository quantifies human preferences between pairs of images, along with the associated uncertainties. Leveraging these measured uncertainties, a sampling algorithm is proposed to select a subset of the dataset for efficient pairwise comparison in subjective testing.
Language:Python1 0

preference-learning

allenai/reward-bench

tournesol-app/tournesol

IAAR-Shanghai/ICSFSurvey

qxcv/magical

SMARTlab-Purdue/SAN-NaviSTAR

JanoschMenke/metis

sail-sg/dice

gao-g/prelude

CJReinforce/RIME_ICML2024

typoverflow/WiseRL

vicgalle/configurable-safety-tuning

julilien/PLDepth

aaronpmishkin/gaussian_processes

SMARTlab-Purdue/SAN-FAPL

98k-bot/GAN-Assisted-Preference-Based-Learning

Intelligent-Systems-Group/jpl-framework

albiboni/User-RecSys

makgyver/PRL

aleksa-sukovic/iclr2024-reward-design-for-justifiable-rl

LemurPwned/bradley-terry-ui

ma921/CoExBO

mahmadif/able2rank

Rahgooy/MDFT

Bekyilma/Master_thesis

benki-finance/finbench-arena

FareedKhan-dev/APReL-Mountain-Car-Reinforcement-Learning

jimparr19/pypbl

lasgroup/MaxMinLCB

BARUDA-AI/Awesome-Preference-Optimization

DanieleF198/ILASP-as-post-hoc-method-in-a-preference-system

mahmadif/able2rank_

rowlandseymour/BSBT

TristanFauvel/Bayesian_test_for_preference

afiliot/Preference-Learning-And-Movie-Reviews

Dev1nW/Simplified-Rating-and-Preference-RL

shimamohammadi/LBPS-EIC