vwxyzjn

RLHF @huggingface, CS Ph.D. from Drexel University in RL.

@huggingfacePhiladelphia, PA

Pinned Repositories

trl
Train transformer language models with reinforcement learning.
Language:Python8.5k 78 9391k
cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Language:Python96 4 49
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Language:Python4.7k 35 170548
gym-microrts-paper
The source code for the gym-microrts paper.
Language:Python38 4 63
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Language:Python124 2 319
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
Language:Python130 4 77
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
Language:Go557 9 2831
PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
Language:Python41 2 13
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Language:Python573 3 687
summarize_from_feedback_details
Language:Python85 4 09

vwxyzjn's Repositories

vwxyzjn/action-guidance
Language:Python6 3 0
vwxyzjn/notablog-starter
The official starter project for Notablog.
Language:CSS5 1 03
vwxyzjn/aws-sagemaker-example
Language:Jupyter Notebook1 2 0
vwxyzjn/embedding_projector
Language:Python1 2 0
vwxyzjn/gym_minigrid
Language:Python1 2 0
vwxyzjn/accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
Language:Python1 0
vwxyzjn/awesome-shields
The list of styled dynamic informational shields, given the ability to exist by the truly amazing work of shields.io 😍
Language:Makefile1 0
vwxyzjn/client
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
Language:Python1 0
vwxyzjn/cloudbase-examples
TCB 案例
Language:JavaScript1 0
vwxyzjn/cloudbase-python-app
Language:Dockerfile2 0
vwxyzjn/consistent_depth
We estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.
Language:Python1 0
vwxyzjn/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. ICAIF 2020.
Language:Python1 0
vwxyzjn/docker_queue
Language:Vue2 0
vwxyzjn/dvc-registry
2 0
vwxyzjn/fastapi
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Language:Python1 0
vwxyzjn/gym3
Vectorized interface for reinforcement learning environments
Language:Python1 0
vwxyzjn/local
W&B Local is the self hosted version of Weights & Biases
Language:HCL1 0
vwxyzjn/microrts-ppo-comparison
Compare PPO implementation performance on microrts gym env
Language:Python1 0
vwxyzjn/microrts-sb3
Language:Python2 0
vwxyzjn/MineRL2021-Intro-baselines
MineRL 2021 Intro track baselines
Language:Python1 0
vwxyzjn/phasic-policy-gradient
Code for the paper "Phasic Policy Gradient"
Language:Python1 0
vwxyzjn/pytorch-a2c-ppo-acktr-gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Language:Python1 0
vwxyzjn/rl-colab-notebooks
Colab notebooks part of the documentation of Stable Baselines reinforcement learning library
Language:Jupyter Notebook1 0
vwxyzjn/rts-generalization
Language:Python2 0
vwxyzjn/stack
Language:HCL2 01
vwxyzjn/SuperSuit
Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments
Language:Python1 0
vwxyzjn/test-launch
Language:Python2 0
vwxyzjn/typer
Typer, build great CLIs. Easy to code. Based on Python type hints.
Language:Python1 0
vwxyzjn/wandb-notebook-testing
Language:Dockerfile2 0
vwxyzjn/wandb-stack
Language:HCL2 0