MFPO

This work "A Framework for Federated Reinforcement Learning with Interaction and Communication Efficiency" has been submitted in INFOCOM 2024.

📄 Description

Momentum-assisted Federated Policy Optimization (MFPO), capable of jointly optimizing both interaction and communication complexities. Specifically, we introduce a new FRL framework that utilizes momentum, importance sampling, and extra server-side updating to control the variates of stochastic policy gradients and improve the efficiency of data utilization.

🔧 Dependencies

Python == 3.7 (Recommend to use Anaconda or Miniconda)
PyTorch == 1.8.1
MuJoCo == 2.3.6
NVIDIA GPU (RTX A6000) + CUDA 11.1

Installation

Clone repo

git clone [https://github.com/HansenHua/MFPO-Online-Federated-Reinforcement-Learning.git](https://github.com/HansenHua/MFPO-INFOCOM24.git)
cd MFPO-Online-Federated-Reinforcement-Learning

Install dependent packages
```
pip install -r requirements.txt
```

⚡ Quick Inference

Get the usage information of the project

cd code
python main.py -h

Then the usage information will be shown as following

usage: main.py [-h] [--env_name ENV_NAME] [--method METHOD] [--gamma GAMMA] [--batch_size BATCH_SIZE]
               [--local_update LOCAL_UPDATE] [--num_worker NUM_WORKER] [--average_type AVERAGE_TYPE] [--c C]
               [--seed SEED] [--lr_a LR_A] [--lr_c LR_C]
               mode max_iteration

positional arguments:
  mode                  train or test
  max_iteration         maximum training iteration

optional arguments:
  -h, --help            show this help message and exit
  --env_name ENV_NAME   the name of environment
  --method METHOD       method name
  --gamma GAMMA         gamma
  --batch_size BATCH_SIZE
                        batch_size
  --local_update LOCAL_UPDATE
                        frequency of local update
  --num_worker NUM_WORKER
                        number of federated agents
  --average_type AVERAGE_TYPE
                        average type (target/network/critic)
  --c C                 momentum parameter
  --seed SEED           random seed
  --lr_a LR_A           learning rate of actor
  --lr_c LR_C           learning rate of critic

Test the trained models provided in MFPO-Momentum-assisted Federated Policy Optimization.

python main.py CartPole-v1 MFPO test

💻 Training

We provide complete training codes for MFPO.
You could adapt it to your own needs.

```
python main.py CartPole-v1 MFPO train
```
The log files will be stored in [MFPO-Online-Federated-Reinforcement-Learning/code/log](https://github.com/HansenHua/MFPO-INFOCOM24/tree/main/code/log).

🏁 Testing

Testing
```
python main.py CartPole-v1 MFPO test
```
Illustration

We alse provide the performance of our model. The illustration videos are stored in MFPO-Online-Federated-Reinforcement-Learning/performance.

📧 Contact

If you have any question, please email xingyuanhua@bit.edu.cn.