holarissun

PhD in Reinforcement Learning, LLM Alignment, RLHF

University of Cambridge

holarissun's Stars

huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python134k 1.1k 16k26.8k
tldr-pages/tldr
📚 Collaborative cheatsheets for console commands
Language:Markdown51k 372 1.4k4.2k
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python20.1k 308 1.4k2.5k
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
Language:Jupyter Notebook18.6k 154 4692.2k
pre-commit/pre-commit
A framework for managing and maintaining multi-language pre-commit hooks.
Language:Python12.9k 88 2.1k813
pre-commit/pre-commit-hooks
Some out-of-the-box hooks for pre-commit
Language:Python5.3k 44 496705
datamllab/rlcard
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Language:Python2.9k 74 198627
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Language:Python2.1k 19 81172
hyp1231/awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
1.5k 42 7120
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Language:Python1.3k 18 84118
KhoomeiK/LlamaGym
Fine-tune LLM agents with online reinforcement learning
Language:Python987 7 944
openai/summarize-from-feedback
Code for "Learning to summarize from human feedback"
Language:Python987 147 21143
zjunlp/Prompt4ReasoningPapers
[ACL 2023] Reasoning with Language Model Prompting: A Survey
885 27 568
bupticybee/TexasHoldemSolverJava
A Java implemented Texas holdem and short deck Solver
Language:Java809 18 25186
tatsu-lab/alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Language:Python774 9 4260
facebookresearch/contriever
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
Language:Python676 15 1759
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
Language:Python405 18 2244
Joyce94/LLM-RLHF-Tuning
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
Language:Python368 2 317
kaixindelele/ChatOpenReview
Crowdfunding open source projects: use OpenReview's high-quality review data to fine-tune a professional review and response LLM. 众筹开源项目：利用OpenReview的优质审稿数据，微调出一个专业的审稿和审稿回复GPT
Language:Python197 9 012
jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Language:Python126 6 210
kaiwenzha/Rank-N-Contrast
[NeurIPS 2023, Spotlight] Rank-N-Contrast: Learning Continuous Representations for Regression
Language:Python88 1 67
jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
Language:Python56 5 16
AlgTUDelft/WCSAC
Code for the paper "WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning"
Language:Python51 4 518
YangRui2015/RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
Language:Python44 1 84
michaelnny/InstructLLaMA
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.
Language:Jupyter Notebook41 0 89
cassidylaidlaw/hidden-context
Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
Language:Python25 1 05
karush17/Deep-Eligibility-Traces
Implementation of Eligibility Traces with Neural Networks in PyTorch and Tensorflow 2.0
Language:Python23 6 51
SimengSun/alpaca_farm_lora
Language:Python22 1 01
XanderJC/attention-based-credit
Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt, and Mihaela van der Schaar
Language:Python19 4 31
WeijieyingRen/TabLog
Code of TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules, ICML 2024
2 3 1

holarissun

holarissun's Stars

huggingface/transformers

tldr-pages/tldr

microsoft/unilm

tloen/alpaca-lora

pre-commit/pre-commit

pre-commit/pre-commit-hooks

datamllab/rlcard

eric-mitchell/direct-preference-optimization

hyp1231/awesome-llm-powered-agent

PKU-Alignment/safe-rlhf

KhoomeiK/LlamaGym

openai/summarize-from-feedback

zjunlp/Prompt4ReasoningPapers

bupticybee/TexasHoldemSolverJava

tatsu-lab/alpaca_farm

facebookresearch/contriever

RLHFlow/Online-RLHF

Joyce94/LLM-RLHF-Tuning

kaixindelele/ChatOpenReview

jackaduma/ChatGLM-LoRA-RLHF-PyTorch

kaiwenzha/Rank-N-Contrast

jackaduma/Alpaca-LoRA-RLHF-PyTorch

AlgTUDelft/WCSAC

YangRui2015/RiC

michaelnny/InstructLLaMA

cassidylaidlaw/hidden-context

karush17/Deep-Eligibility-Traces

SimengSun/alpaca_farm_lora

XanderJC/attention-based-credit

WeijieyingRen/TabLog