/GRU_AC

GRU-PPO for stable-baselines3.

Primary LanguagePython

GRU-PPO for stable-baselines3.

How to train

python3 train.py

After traning done, LSTM PPO vs GRU PPO vs PPO results are saved as videos

results in BipedalWalker-v3 env

[Note]

all parameters are same, except which has recurrent neural network or not.

LSTM and GRU, both have same hidden_state shape.

  1. LSTM PPO
LSTM-episode-0.mp4
  1. GRU PPO
GRU-episode-0.mp4
  1. PPO
PPO-episode-0.mp4

library compatibility

torch: 1.13.1+cu116

stable_baselines3: 2.3.0

sb3_contrib: 2.3.0

My sister project (related to GRU)

preference based RL with GRU reward model for imitation library

https://github.com/CAI23sbP/RecurrentRLHF