Federated Policy Gradient with Byzantine Resilience (FedPG-BR)

This is the code for the FedPG-BR framework presented in the paper:

Flint Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Wei Jing, Cheston Tan and Kian Hsiang Low. "Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee." In 35th Conference on Neural Information Processing Systems (NeurIPS-21), Dec 6-14, 2021.

The experiments in the paper were conducted on Ubuntu 18.04 with a 14 cores (28 threads) Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz and 64G memory.

We also tested on Ubuntu 20.04 with an 8 cores (16 threads) Intel(R) Core(TM) i9-11900K CPU @ 3.50GHz and 128G memory. Some extra steps may be necessary for this CPU which we gave details in installation.

Dependencies

  • Python 3.7
  • Pytorch 1.5.0
  • numpy
  • tensorboard
  • tqdm
  • sklearn
  • matplotlib
  • OpenAI Gym
  • Box2d [for running experiments on LunarLander environment]
  • mujoco150
  • mujoco-py 1.50.1.68 [for running experiments on HalfCheetah environment]

Installation

$ conda create -n FedPG-BR pytorch=1.5.0

$ conda activate FedPG-BR

Please then follow the instructions here to setup mujoco and install mujoco-py. Please download the legacy version of mujoco (mujoco150) as we did not test using the latest version of mujoco.

Once you have ensured that mujoco_py has been successfully installed, proceed with


$ pip install -r requirements.txt


For centain CPUs, if you run into issue of Intel MKL FATAL ERROR: Cannot load libmkl_avx512.so or libmkl_def.so, then follow this solution. It will reinstall certain packages for certain intel chips and also removes pytorch and mujoco-py. So you will want to reinstall pytorch (1.5.0) and mujoco-py (1.50.1.68)


To check your installation, run

$ python run.py --env_name HalfCheetah-v2 --FedPG_BR --num_worker 10 --num_Byzantine 0 --log_dir ./logs_HalfCheetah --multiple_run 10 --run_name HalfCheetah_FedPGBR_W10B0

If terminal returns messages similar to those shown below, then your installation is all good.

log

Usage

To reproduce the results of FedPG-BR (K= 10) in Figure 1 for the HalfCheetah task, run the following command:

$ python run.py --env_name HalfCheetah-v2 --FedPG_BR --num_worker 10 --num_Byzantine 0 --log_dir ./logs_HalfCheetah --multiple_run 10 --run_name HalfCheetah_FedPGBR_W10B0

To reproduce the results of FedPG-BR (K= 10B= 3) in Figure 2 where 3 Byzantine agents are Random Noise in theHalfCheetah task environment, run the following command:

$ python run.py --env_name CartPole-v1 --FedPG_BR --num_worker 10 --num_Byzantine 3 --attack_type random-noise --log_dir ./logs_Cartpole --multiple_run 10 --run_name Cartpole_FedPGBR_W10B3

Replace --FedPG_BR with --SVRPG for the results of SVRPG in the same experiment.

Visualization

Training stats

All results including allstatistics will be logged into the directory indicated by‘--log_dir‘, which can be visualized in tensorboard by running tensorboard accordingly, e.g., tensorboard --logdir logs_Cartpole/ --host localhost --port 8008

You will be able to visualize different stats of each run there, e.g., training-visual

Rendering

To visualize the behavior of the learnt policy, run the experiment in evaluation mode with rendering option on. For example:

$ python run.py --env_name CartPole-v1 --FedPG_BR --eval_only --render --load_path PATH_TO_THE_SAVED_POLICY_MODEL

Demo

See the video file ./FedPG-BR_demo_edited.mp4

Acknowledgements

This research/project is supported by ASTAR under its RIE$2020$ Advanced Manufacturing and Engineering (AME) Industry Alignment Fund – Pre Positioning (IAF-PP) (Award A$19$E$4$a$0101$) and its ASTAR Computing and Information Science Scholarship (ACIS) awarded to Flint Xiaofeng Fan. Wei Jing is supported by Alibaba Innovative Research (AIR) Program.