Official Implementation of Efficient and Scalable Reinforcement Learning for Large-scale Network Control
- Model-based Decentralized Policy Optimization, Our method
- DPPO (Decentralized PPO)
- CPPO (Centralized PPO)
- IMPO (Independent Model-based Policy Optimization)
- IC3Net (Individualized Controlled Continuous Communication Model)
- Model-free baselines (CommNet, NeurComm, DIAL, ConseNet ... For more details and code, please refer to: https://arxiv.org/abs/2004.01339)
- Model-based baselines (MAG, For more details and code, please refer to: https://ojs.aaai.org/index.php/AAAI/article/view/26241)
Key parameters for our decentralized algorithms:
- radius_v: communication radius for value function, 1,2,3....
- radius_pi: communication radius for policy, default 1
- radius_p: communication radius for environment model, default 1
- CACC Catchup
- CACC Slowdown
- Ring Attenuation
- Figure Eight
- ATSC Grid
- ATSC Monaco
- ATSC New York
- Power-Grid
- Real Power Net
- Pandemic Net
Linux: Ubuntu 20.04
Driver Version: 535.154.05
CUDA Version: 12.2
Python 3.8+
For other dependent packages, please refer to environment.yml
CACC, Flow and ATSC are developed based on Sumo, you need to install the corresponding version of sumo as follows:
- SUMO installation. Version 1.11.0
The commit number of SUMO, available at https://github.com/eclipse/sumo used to run the results is 2147d155b1. To install SUMO, you are recommended to refer to https://sumo.dlr.de/docs/Installing/Linux_Build.html to install the specific version via repository checkout. Note that the latest version of SUMO is not compatible with Flow environments. In brief, after you checkout to that version, run the following command to build the SUMO binaries.
sudo apt-get install cmake python g++ libxerces-c-dev libfox-1.6-dev libgdal-dev libproj-dev libgl2ps-dev swig
cd <sumo_dir> # please insert the correct directory name here
export SUMO_HOME="$PWD"
mkdir build/cmake-build && cd build/cmake-build
cmake ../..
make -j$(nproc)
After building, you need to manually ad the bin folder into your path:
export PATH=$PATH:$SUMO_HOME/bin
- Setting up the environment.
It's recommended to set up the environment via Anaconda. The environment specification is in environment.yml. After installing the required packages, run
export PYTHONPATH="$SUMO_HOME/tools:$PYTHONPATH"
in terminal to include the SUMO python packages.
Due to the numerous environments and algorithms involved in this project, conflicts between packages are inevitable and difficult to perfectly resolve in the same environment. Therefore, we will establish two conda environments to perform different training tasks separately.
1.To run environments and algorithms related to CACC, FLOW, ATSC, and PowerGrid, Environment 1 needs to be configured:
conda env create -f environment_1.yml
2.To run environments and algorithms related to Pandemic, Real Power-net and baselines, Environment 2 needs to be configured:
conda env create -f environment_2.yml
and then:
conda activate Pandemic_RealPower_baselines
and then:
cd algorithms/envs/PandemicSimulator
and then (Sometimes you may need to run it twice):
python3 -m pip install -e .
You need to install both environments 1 and 2 at the same time. If an error occurs while running in one environment, please switch to the other environment and try again, thanks.
We uses WandB as logger.
- Setting up WandB.
Before running our code, you should log in to WandB locally. Please refer to https://docs.wandb.ai/quickstart for more detail.
- Download the data from the link. refer to https://github.com/Future-Power-Networks/MAPDN
- Unzip the zip file
- Go to the directory
algorithms/envs/Real_Power_net/var_voltage_control/
and create a folder calleddata
, Then create 3 folders:case141_3min_final
case322_3min_final
case199_3min_final
- Move data in
case141_3min_final
to folderalgorithms/envs/Real_Power_net/var_voltage_control/data/case141_3min_final
- Move data in
case322_3min_final
to folderalgorithms/envs/Real_Power_net/var_voltage_control/data/case322_3min_final
- Move data (
load_active.csv
,load_reactive.csv
,pv_active.csv
, except formodel.p
) incase33_3min_final
to folderalgorithms/envs/Real_Power_net/var_voltage_control/data/case199_3min_final
Train the agent (DPPO, CPPO, IC3Net, Our method) by:
python launcher.py --env ENV --algo ALGO --device DEVICE
ENV
specifies which environment to run in, including eight
, ring
, catchup
, slowdown
, Grid
, Monaco
, PowerGrid
, Real_Power
, Pandemic
, Large_city
.
ALGO
specifies the algorithm to use, including IC3Net
, CPPO
, DPPO
, DMPO
, IA2C
.
DEVICE
specifies the device to use, including cpu
, cuda:0
, cuda:1
, cuda:2
...
such as:
python launcher.py --env 'slowdown' --algo 'DMPO' --device 'cuda:0'
Train the model-free baselines (CommNet, NeurComm, DIAL, ConseNet ...) by:
cd commmunication-based-baselines
and then open main.py to set the environment and algorithm. and then:
python main.py train
Train the model-based baselines (MAG) by:
cd model-based-baselines
and then run the train.py, you can set the environment and algorithm, such as:
python train.py --env 'powergrid' --env_name "powergrid"
Evaluate the agent (DPPO, CPPO, IC3Net, Our method) by:
After trainging, the actors model will be saved in checkpoints/xxx/Models/xxxbest_actor.pt, You just need to add following code in algorithms/algo/agent/DPPO.py(DMPO.py/CPPO.py/...):
self.actors.load_state_dict(torch.load(test_actors_model))
after initializing actors:
self.collect_pi, self.actors = self._init_actors()
where:
test_actors_model = 'checkpoints/standard _xxx/Models/xxxbest_actor.pt'
We also provide evaluation code. After replacing the corresponding actor model, please run the following command to evaluate in CACC, such as:
python evaluate_cacc.py --env 'slowdown' --algo 'DPPO' --device 'cuda:0'
to evaluate in Flow, such as:
python evaluate_flow.py --env 'ring' --algo 'DPPO' --device 'cuda:0'
to evaluate in ATSC, such as:
python evaluate_atsc.py --env 'Monaco' --algo 'DPPO' --device 'cuda:0'
to evaluate in Real Power-Net, such as:
python evaluate_real_power.py --env 'Real_Power' --algo 'DPPO' --device 'cuda:0'
to evaluate in Pandemic Net, such as:
python evaluate_pandemic.py --env 'Pandemic' --algo 'DPPO' --device 'cuda:0'
to evaluate in Large_city (New York city), such as:
python evaluate_large_city.py --env 'Large_city' --algo 'DPPO' --device 'cuda:0'
- For Power Grid, two settings are used, one with 20 agents and the other with 40 agents. You need to switch between different settings by opening the file:
./algorithms/envs/PowerGrid/envs/Grid_envs.py
and modifying the corresponding code:
DER_num = 20 # 20 or 40
- For Real Power Net, three settings are used, 141 bus, 322 bus and 421 bus (Corresponding to 191 bus in the code). You need to switch between different settings by opening the file:
./algorithms/envs/Real_Power.py
and modifying the corresponding code:
net_topology ='case199_3min_final' # case141_3min_final / case322_3min_final /case199_3min_final
- For Pandemic Net, five settings are used, 500 / 1000 / 2000 / 5000 / 10000 population. You need to switch between different settings by opening the file:
./algorithms/envs/Pandemic_ENV.py
and modifying the corresponding code:
sim_config = ps.sh.small_town_config # ['town_config':1w, 'above_medium_town_config':5000, "medium_town_config":2000, 'small_town_config':1000, 'tiny_town_config':500]
Please cite the paper in the following format if you used this code during your research
@article{ma2024efficient,
title={Efficient and scalable reinforcement learning for large-scale network control},
author={Ma, Chengdong and Li, Aming and Du, Yali and Dong, Hao and Yang, Yaodong},
journal={Nature Machine Intelligence},
pages={1--15},
year={2024},
publisher={Nature Publishing Group UK London}
}