vwxyzjn

RLHF @huggingface, CS Ph.D. from Drexel University in RL.

@huggingfacePhiladelphia, PA

Pinned Repositories

trl
Train transformer language models with reinforcement learning.
Language:Python8.5k 78 9401k
cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Language:Python96 4 49
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Language:Python4.7k 35 170548
gym-microrts-paper
The source code for the gym-microrts paper.
Language:Python38 4 63
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Language:Python124 2 319
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
Language:Python130 4 77
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
Language:Go557 9 2831
PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
Language:Python41 2 13
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Language:Python573 3 687
summarize_from_feedback_details
Language:Python86 4 09

vwxyzjn's Repositories

vwxyzjn/PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
Language:Python41 2 13
vwxyzjn/a2c_is_a_special_case_of_ppo
A2C is a special case of PPO!
Language:Python18 4 22
vwxyzjn/vectorized-value-methods
[WIP] Vectorized architecture for value-based methods such as DQN and DDPG
Language:Python3 2 22
vwxyzjn/launcha
Launcha is a simple Docker-based cloud job launcher.
Language:Python1 2 0
vwxyzjn/validate-new-gym-mujoco-envs
Language:Python1 3 0
vwxyzjn/Arcade-Learning-Environment
The Arcade Learning Environment (ALE) -- a platform for AI research.
Language:C++1 0
vwxyzjn/birthday
A Happy Birthday animation design in CSS3, HTML5
Language:CSS1 0
vwxyzjn/brax
Massively parallel rigidbody physics simulation on accelerator hardware.
Language:Jupyter Notebook1 0
vwxyzjn/composer
library of algorithms to speed up neural network training
Language:Python1 0
vwxyzjn/container-apps-store-api-microservice
Sample microservices solution using Azure Container Apps, Dapr, Cosmos DB, and Azure API Management
Language:Shell1 0
vwxyzjn/draw.io
2 0
vwxyzjn/environment
Neural MMO - A Massively Multiagent Environment for Artificial Intelligence Research
Language:Python1 0
vwxyzjn/gym
A toolkit for developing and comparing reinforcement learning algorithms.
Language:Python1 0
vwxyzjn/gym-docs
Code for Gym documentation website
1 0
vwxyzjn/gym-microrts-paper-sb3
RL agent to play μRTS with Stable-Baselines3
Language:Python1 0
vwxyzjn/gym-microrts-static-files
2 0
vwxyzjn/gym-robotics
Language:Python1 0
vwxyzjn/iclr-blog-track.github.io
Language:HTML1 01
vwxyzjn/incubator
Collection of in-progress libraries for entity neural networks.
Language:Python1 0
vwxyzjn/isort
A Python utility / library to sort imports.
Language:Python1 0
vwxyzjn/jaxrl
JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
Language:Jupyter Notebook1 0
vwxyzjn/launcha-sb3-example
Language:Python2 0
vwxyzjn/MA-ALE2
Language:Python1 0
vwxyzjn/minihack
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Language:Python1 0
vwxyzjn/MultiAgentObjectCollectorEnv
Language:Python1 0
vwxyzjn/nmmo-cleanrl-incubator
2 0
vwxyzjn/PPO-Procgen-Reproduction
Language:Python3 0
vwxyzjn/stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Language:Python1 0
vwxyzjn/stable-baselines3-contrib
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
Language:Python1 0
vwxyzjn/tianshou
An elegant PyTorch deep reinforcement learning library.
Language:Python1 0