Pinned Repositories
trl
Train transformer language models with reinforcement learning.
cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
gym-microrts-paper
The source code for the gym-microrts paper.
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
summarize_from_feedback_details
vwxyzjn's Repositories
vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
vwxyzjn/ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
vwxyzjn/portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
vwxyzjn/lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
vwxyzjn/summarize_from_feedback_details
vwxyzjn/cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
vwxyzjn/costa-utils
vwxyzjn/benchmark-ci
vwxyzjn/LeanRL
LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.
vwxyzjn/lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
vwxyzjn/minimal-adam-difference
vwxyzjn/trl
Train transformer language models with reinforcement learning.
vwxyzjn/alignment-handbook
Robust recipes for to align language models with human and AI preferences
vwxyzjn/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
vwxyzjn/quickchat
vwxyzjn/cleanba-test
vwxyzjn/envpool_bug
vwxyzjn/hfblog
Public repo for HF blog posts
vwxyzjn/optax
Optax is a gradient processing and optimization library for JAX.
vwxyzjn/PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
vwxyzjn/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
vwxyzjn/zero3_min_repro
vwxyzjn/2024
vwxyzjn/huggingface_hub
The official Python client for the Huggingface Hub.
vwxyzjn/MOSS-RLHF
MOSS-RLHF
vwxyzjn/open-instruct
vwxyzjn/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
vwxyzjn/summarize-from-feedback
Code for "Learning to summarize from human feedback"
vwxyzjn/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vwxyzjn/vwxyzjn.github.io