Pinned Repositories
trl
Train transformer language models with reinforcement learning.
cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
gym-microrts-paper
The source code for the gym-microrts paper.
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
summarize_from_feedback_details
vwxyzjn's Repositories
vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
vwxyzjn/ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
vwxyzjn/portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
vwxyzjn/invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
vwxyzjn/lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
vwxyzjn/cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
vwxyzjn/summarize_from_feedback_details
vwxyzjn/benchmark-ci
vwxyzjn/free-mujoco-py
MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.
vwxyzjn/lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
vwxyzjn/minimal-adam-difference
vwxyzjn/ppo-atari-metrics
vwxyzjn/microrts
vwxyzjn/trl
Train transformer language models with reinforcement learning.
vwxyzjn/alignment-handbook
Robust recipes for to align language models with human and AI preferences
vwxyzjn/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
vwxyzjn/cleanba-test
vwxyzjn/envpool_bug
vwxyzjn/hfblog
Public repo for HF blog posts
vwxyzjn/optax
Optax is a gradient processing and optimization library for JAX.
vwxyzjn/quickchat
vwxyzjn/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
vwxyzjn/zero3_min_repro
vwxyzjn/2024
vwxyzjn/MOSS-RLHF
MOSS-RLHF
vwxyzjn/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
vwxyzjn/PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
vwxyzjn/summarize-from-feedback
Code for "Learning to summarize from human feedback"
vwxyzjn/torchbeast
A PyTorch Platform for Distributed RL
vwxyzjn/tyro
Strongly typed, zero-effort CLI interfaces & config objects