Bipedal Walker Hardcore (and Classic) with SAC and TD3

Bipedal Walker environments of GYM are difficult problems to solve by reinforcement learning. work In this repository, my thesis is available. Various neural network architectures and RL methods implementations for solving BipedalWalker-v3 and BipedalWalkerHardcore-v3 of GYM on PyTorch using Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3).

Only Hardcore environment is solved by SAC and TD3 algorithm. Reward is manipulated and frame rate is halved.

Neural Nets

Feed Forward Neural Network with Residual connection
Transformer (6 or 12 observation history as input)
Long Short Term Memory (6 or 12 observation history as input)

Only Hardcore environment is solved by TD3 and SAC algorithms. Reward is manipulated and frame rate is halved.

How to

Create new python environment and First install requirements. (python 3.6)

pip install -r requirements.txt

Training

Train your model via following commands.

Train R. Feed Forward NN with SAC

python main_script.py -f train -r sac -m ff

Train Transformer (6 obs hist) with SAC

python main_script.py -f train -r sac -m trsf -hl 6

Train Transformer (12 obs hist) with SAC

python main_script.py -f train -r sac -m trsf -hl 12

Train LSTM (6 obs hist) with SAC

python main_script.py -f train -r sac -m lstm -hl 6

Train LSTM (12 obs hist) with SAC

python main_script.py -f train -r sac -m lstm -hl 12

Train R. Feed Forward NN with TD3

python main_script.py -f train -r td3 -m ff

Train Transformer (6 obs hist) with TD3

python main_script.py -f train -r td3 -m trsf -hl 6

Train LSTM (6 obs hist) with TD3

python main_script.py -f train -r td3 -m trsf -hl 12

Simulating pretrained models

Download pretrained models from the following link and place onto models folder https://drive.google.com/drive/folders/1BtqZXrJyuoBiyeE9IduWj7IkFN-urw6y?usp=sharing

Then run one of the following commands for best checkpoints,

python main_script.py -f test -r sac -m ff -c ep7600

python main_script.py -f test -r sac -m trsf -hl 6 -c ep6800

python main_script.py -f test -r sac -m lstm -hl 6 -c ep7600

python main_script.py -f test -r sac -m trsf -hl 12 -c ep6000

python main_script.py -f test -r sac -m lstm -hl 12 -c ep7200

python main_script.py -f test -r td3 -m ff -c ep6600

python main_script.py -f test -r td3 -m trsf -hl 6 -c ep6400

python main_script.py -f test -r td3 -m lstm -hl 6 -c ep7000

Evaluation

To run a 100 episode gym evaluation with trained model, run following command

python main_script.py -f test-100 -r sac -m lstm -hl 12 -c ep7200