/MFPO-INFOCOM24

An online federated reinforcement learning algorithm published in INFOCOM2024

Primary LanguagePython

MFPO

This work "A Framework for Federated Reinforcement Learning with Interaction and Communication Efficiency" has been submitted in INFOCOM 2024.

📄 Description

Momentum-assisted Federated Policy Optimization (MFPO), capable of jointly optimizing both interaction and communication complexities. Specifically, we introduce a new FRL framework that utilizes momentum, importance sampling, and extra server-side updating to control the variates of stochastic policy gradients and improve the efficiency of data utilization.

🔧 Dependencies

Installation

  1. Clone repo
    git clone [https://github.com/HansenHua/MFPO-Online-Federated-Reinforcement-Learning.git](https://github.com/HansenHua/MFPO-INFOCOM24.git)
    cd MFPO-Online-Federated-Reinforcement-Learning
  2. Install dependent packages
    pip install -r requirements.txt
    

⚡ Quick Inference

Get the usage information of the project

cd code
python main.py -h

Then the usage information will be shown as following

usage: main.py [-h] [--env_name ENV_NAME] [--method METHOD] [--gamma GAMMA] [--batch_size BATCH_SIZE]
               [--local_update LOCAL_UPDATE] [--num_worker NUM_WORKER] [--average_type AVERAGE_TYPE] [--c C]
               [--seed SEED] [--lr_a LR_A] [--lr_c LR_C]
               mode max_iteration

positional arguments:
  mode                  train or test
  max_iteration         maximum training iteration

optional arguments:
  -h, --help            show this help message and exit
  --env_name ENV_NAME   the name of environment
  --method METHOD       method name
  --gamma GAMMA         gamma
  --batch_size BATCH_SIZE
                        batch_size
  --local_update LOCAL_UPDATE
                        frequency of local update
  --num_worker NUM_WORKER
                        number of federated agents
  --average_type AVERAGE_TYPE
                        average type (target/network/critic)
  --c C                 momentum parameter
  --seed SEED           random seed
  --lr_a LR_A           learning rate of actor
  --lr_c LR_C           learning rate of critic

Test the trained models provided in MFPO-Momentum-assisted Federated Policy Optimization.

python main.py CartPole-v1 MFPO test

💻 Training

We provide complete training codes for MFPO.
You could adapt it to your own needs.

```
python main.py CartPole-v1 MFPO train
```
The log files will be stored in [MFPO-Online-Federated-Reinforcement-Learning/code/log](https://github.com/HansenHua/MFPO-INFOCOM24/tree/main/code/log).

🏁 Testing

  1. Testing
    python main.py CartPole-v1 MFPO test
    
  2. Illustration

We alse provide the performance of our model. The illustration videos are stored in MFPO-Online-Federated-Reinforcement-Learning/performance.

📧 Contact

If you have any question, please email xingyuanhua@bit.edu.cn.