vwxyzjn

RLHF @allenai, CS Ph.D. from Drexel University in RL.

@huggingfacePhiladelphia, PA

Pinned Repositories

trl
Train transformer language models with reinforcement learning.
Language:Python10.4k 77 1.3k1.3k
cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Language:Python106 4 511
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Language:Python5.9k 38 186673
gym-microrts-paper
The source code for the gym-microrts paper.
Language:Python42 4 63
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Language:Python146 2 322
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
Language:Python160 4 78
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
Language:Go599 11 3035
PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
Language:Python45 2 13
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Language:Python662 3 699
summarize_from_feedback_details
Language:Python120 4 216

vwxyzjn's Repositories

vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Language:Python5.9k 38 186673
vwxyzjn/ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Language:Python662 3 699
vwxyzjn/portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
Language:Go599 11 3035
vwxyzjn/lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
Language:Python160 4 78
vwxyzjn/summarize_from_feedback_details
Language:Python120 4 216
vwxyzjn/cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Language:Python106 4 511
vwxyzjn/costa-utils
Language:Python10 2 0
vwxyzjn/benchmark-ci
Language:Python7 2 11
vwxyzjn/LeanRL
LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.
Language:Python6 0 0
vwxyzjn/lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
Language:Python4 1 0
vwxyzjn/minimal-adam-difference
Language:Python4 3 0
vwxyzjn/trl
Train transformer language models with reinforcement learning.
Language:Python4 1 0
vwxyzjn/alignment-handbook
Robust recipes for to align language models with human and AI preferences
Language:Python2 1 0
vwxyzjn/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Language:Python2 1 0
vwxyzjn/quickchat
Language:Python2 2 0
vwxyzjn/cleanba-test
Language:Python1 2 0
vwxyzjn/envpool_bug
Language:Python1 2 0
vwxyzjn/hfblog
Public repo for HF blog posts
Language:Jupyter Notebook1 1 0
vwxyzjn/optax
Optax is a gradient processing and optimization library for JAX.
Language:Python1 1 0
vwxyzjn/PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
Language:Jupyter Notebook1 1 01
vwxyzjn/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python1 1 0
vwxyzjn/zero3_min_repro
Language:Python1 2 0
vwxyzjn/2024
Language:HTML1 0
vwxyzjn/huggingface_hub
The official Python client for the Huggingface Hub.
Language:Python0 0
vwxyzjn/MOSS-RLHF
MOSS-RLHF
Language:Python1 0
vwxyzjn/open-instruct
Language:Python0 0
vwxyzjn/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python1 0
vwxyzjn/summarize-from-feedback
Code for "Learning to summarize from human feedback"
Language:Python1 0
vwxyzjn/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0
vwxyzjn/vwxyzjn.github.io
Language:HTML2 1