GRU_AC: A Python repository from CAI23sbP

GRU-PPO for stable-baselines3.

python3 train.py

After traning done, LSTM PPO vs GRU PPO vs PPO results are saved as videos

[Note]

all parameters are same, except which has recurrent neural network or not.

LSTM and GRU, both have same hidden_state shape.

LSTM-episode-0.mp4

GRU-episode-0.mp4

PPO-episode-0.mp4

torch: 1.13.1+cu116

stable_baselines3: 2.3.0

sb3_contrib: 2.3.0

preference based RL with GRU reward model for imitation library