holarissun's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
tldr-pages/tldr
📚 Collaborative cheatsheets for console commands
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
pre-commit/pre-commit
A framework for managing and maintaining multi-language pre-commit hooks.
pre-commit/pre-commit-hooks
Some out-of-the-box hooks for pre-commit
datamllab/rlcard
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
hyp1231/awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
KhoomeiK/LlamaGym
Fine-tune LLM agents with online reinforcement learning
openai/summarize-from-feedback
Code for "Learning to summarize from human feedback"
zjunlp/Prompt4ReasoningPapers
[ACL 2023] Reasoning with Language Model Prompting: A Survey
bupticybee/TexasHoldemSolverJava
A Java implemented Texas holdem and short deck Solver
tatsu-lab/alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
facebookresearch/contriever
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
Joyce94/LLM-RLHF-Tuning
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
kaixindelele/ChatOpenReview
Crowdfunding open source projects: use OpenReview's high-quality review data to fine-tune a professional review and response LLM. 众筹开源项目:利用OpenReview的优质审稿数据,微调出一个专业的审稿和审稿回复GPT
jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
kaiwenzha/Rank-N-Contrast
[NeurIPS 2023, Spotlight] Rank-N-Contrast: Learning Continuous Representations for Regression
jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
AlgTUDelft/WCSAC
Code for the paper "WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning"
YangRui2015/RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
michaelnny/InstructLLaMA
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.
cassidylaidlaw/hidden-context
Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
karush17/Deep-Eligibility-Traces
Implementation of Eligibility Traces with Neural Networks in PyTorch and Tensorflow 2.0
SimengSun/alpaca_farm_lora
XanderJC/attention-based-credit
Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt, and Mihaela van der Schaar
WeijieyingRen/TabLog
Code of TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules, ICML 2024