/FilteredPolicyGradient

Pytorch implementation of the Filtered Policy Gradient (FPG) algorithm

Primary LanguagePythonMIT LicenseMIT

PyTorch implementation of the Filtered Policy Gradient (FPG) algorithm

This is a PyTorch implementation of "Filtered Policy Gradient (FPG)". Please make sure to install the necessary dependencies, particularly Pytorch and MuJoCo.

The current version of FPG is using Gaussian policies suited for continuous control problems. Minimum changes are required to work with discrete action space (log probability functions, etc).

Usage

python main.py --env-name "Swimmer-v3" --sever 0 --attack_norm 10 --max_iter_num 200 --eps 0.01 # Vanilla TRPO
python main.py --env-name "Swimmer-v3" --sever 1 --attack_norm 10 --max_iter_num 200 --eps 0.01 # FPG

Fun Results

TRPO is fooled to learn the backward running policy on HalfCheetah with epsilon=0.01 and delta large enough.

cheetah_backward

FPG remains unaffected.

cheetah_forward