Jax-Baseline

Jax-Baseline is a Reinforcement Learning implementation using JAX and Flax/Haiku libraries, mirroring the functionality of Stable-Baselines.

Features

2-3 times faster than previous Torch and Tensorflow implementations
Optimized using JAX's Just-In-Time (JIT) compilation
Flexible solution for Gym and Unity ML environments

Installation

pip install -r requirement.txt
pip install .

Implementation Status

✔️ : Optional implemented
✅ : Defualt implemented at papers
❌ : Not implemeted yet or can not implemented
💤 : Implemented but didn't update a while (can not guarantee working well now)

Supported Environments

Name	Q-Net based	Actor-Critic based	DPG based
Gymnasium	✔️	✔️	✔️
MultiworkerGym with Ray	✔️	✔️	✔️
Unity-ML Environments	💤	💤	💤

Implemented Algorithms

Q-Net bases

Name	`Double`¹	`Dueling`²	`Per`³	`N-step`⁴⁵	`NoisyNet`⁶	`Munchausen`⁷	`Ape-X`⁸	`HL-Gauss`⁹
DQN¹⁰	✔️	✔️	✔️	✔️	✔️	✔️	✔️	❌
C51¹¹	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️
QRDQN¹²	✔️	✔️	✔️	✔️	✔️	✔️	✔️	❌
IQN¹³	✔️	✔️	✔️	✔️	✔️	✔️	❌	❌
FQF¹⁴	✔️	✔️	✔️	✔️	✔️	✔️	❌	❌
SPR¹⁵	✅	✅	✅	✅	✅	✔️	❌	✔️
BBF¹⁶	✅	✅	✅	✅	✔️	✔️	❌	✔️

Actor-Critic based

Name	`Box`	`Discrete`	`IMPALA`¹⁷
A2C¹⁸	✔️	✔️	✔️
PPO¹⁹	✔️	✔️	✔️²⁰
Truly PPO(TPPO)²¹	✔️	✔️	❌

DPG bases

Name	`Per`³	`N-step`⁴⁵	`Ape-X`⁸
DDPG²²	✔️	✔️	✔️
TD3²³	✔️	✔️	✔️
SAC²⁴	✔️	✔️	❌
TQC²⁵	✔️	✔️	❌
TD7²⁶	✅(LAP²⁷)	❌	❌

Performance Compariton

Test

To test Atari with DQN (or C51, QRDQN, IQN, FQF):

python test/run_qnet.py --algo DQN --env BreakoutNoFrameskip-v4 --learning_rate 0.0002 \
		--steps 5e5 --batch 32 --train_freq 1 --target_update 1000 --node 512 \
		--hidden_n 1 --final_eps 0.01 --learning_starts 20000 --gamma 0.995 --clip_rewards

500K steps can be run in just 15 minutes on Atari Breakout (540 steps/sec). Performance measured on Nvidia RTX3080 and AMD Ryzen 9 5950X in a single process.

score : 9.600, epsilon : 0.010, loss : 0.181 |: 100%|███████| 500000/500000 [15:24<00:00, 540.88it/s]

tinker495/jax-baseline