/RL_VPP_Thesis

Thesis based on the development of a RL agent that manages a VPP through EVs charging stations. Main optimization objectives of the VPP are: Valley filling and peak shaving. Main action performed to reach objectives are: storage of Renewable energy resources and power push in the grid at high demand times. Assumptions of high number of vehicles connected for minimum time of 3-4 hours in the grid.

Primary LanguageJupyter Notebook

RL control strategies for EVs fleet VPPs

[Reinforcement Learning control strategies for Electric Vehicles fleet Virtual Power Plants]

Thesis based on the development of a RL agent that manages a VPP through EVs charging stations in an household environment. Main optimization objectives of the VPP are: Valley filling, peak shaving and zero resulting load over time (supply/demand load balance). Main action performed to reach objectives are: storage of Renewable energy resources and power push in the grid at high demand times. The development of the Virtual Power Plant environment is based on the ELVIS (Electric Vehicles Infrastructure Simulator) open library from DAI-Labor: https://github.com/dailab/elvis . Thesis published paper available at: https://arxiv.org/abs/2405.01889 The thesis code, files and data is currently available at: (https://github.com/francescomaldonato/RL_VPP_Thesis)

Outline:

This research has the intent to investigate on a sustainable way of life of a general household energy production and storage. The main goal of the thesis is to explore the boundaries of a self-sustained energy system with minimum power coming from the grid and expenses. The energy production means are PV solar panel modules and domestic Wind turbines. The storage system is based on EVs batteries. An RL agent will be in charge on managing EVs power resources to guarantee minimum charge left at EVs departure and optimizing peak shaving and valley filling of the power grid. A scenario visualization of such implemented system is shown below.

alt text

The simulation configuration parameters set already loaded is shown below. This can be changed by modifying the ELvis config file in the data/config_builder/ folder (explained below).

alt text [Assumptions of 25 EVs arrival per week for an average parking time of 24 hours in the grid with an average of 50% available battery at arrival. Available car type: Tesla Model S]

Initialization (quick VPP simulation)

It will automatically clone in the remote machine the repository: https://github.com/francescomaldonato/RL_VPP_Thesis.git

Repository structure

RL_VPP_Thesis:

  • VPP_environment.py (Python script containing the environment definition and functions)

  • VPP_simulator.ipynb (Notebook to test the VPP performances and features with the best trained model, currently RecurrentPPO)

  • MALDONATO-RL_control-strategies_for_EVs_fleet_VPP.pdf (Developed thesis paper of the research)

  • Algorithm_simulator_notebooks: (folder with notebooks to test the VPP with different RL algorithms or with random actions)

    • 1-Random_VPP_simulator.ipynb
    • A2C_VPP_simulator.ipynb
    • MaskablePPO_VPP_simulator.ipynb
    • TRPO_VPP_simulator.ipynb
    • RecurrentPPO_VPP_simulator.ipynb
  • EV_experiment_notebooks: (folder with notebooks to test different EVs numbers (weekly arrivals) in the VPP simulation)

    • EVs_RecurrentPPO_VPP_tester.ipynb
    • EVs_RecurrentPPO_VPP_validator.ipynb
    • 35EVs_RecurrentPPO_VPP_simulator.ipynb
    • 30EVs_RecurrentPPO_VPP_simulator.ipynb
    • 25EVs_RecurrentPPO_VPP_simulator.ipynb
    • 20EVs_RecurrentPPO_VPP_simulator.ipynb
    • 15EVs_RecurrentPPO_VPP_simulator.ipynb
    • 10EVs_RecurrentPPO_VPP_simulator.ipynb
  • Agent_trainer_notebooks: (folder with the notebooks to train the VPP RL agent with the indicated set of hyperparameters for each RL algorithm)

    • A2C_VPP_agent_trainer.ipynb
    • MaskablePPO_VPP_agent_trainer.ipynb
    • TRPO_VPP_agent_trainer.ipynb
    • RecurrentPPO_VPP_agent_trainer.ipynb
  • Hyperparameters_sweep_notebooks: (folder with the notebooks to tune Hyperparameters of the VPP RL agents for each RL algorithm)

    • A2C_VPP_Hyperp_Sweep.ipynb
    • MaskablePPO_VPP_Hyperp_Sweep.ipynb
    • TRPO_VPP_Hyperp_Sweep.ipynb
    • RecurrentPPO_VPP_Hyperp_Sweep.ipynb
  • trained_models: (folder with the trained models for each RL algorithm ready to be loaded)

    • A2C_models (folder)
    • MaskablePPO_models (folder)
    • TRPO_models (folder)
    • RecurrentPPO_models (folder)
  • data:

    • training_dataset_merger.ipynb (notebook that visualizes and creates the training dataset table)
    • testing_dataset_merger.ipynb (notebook that visualizes and creates the testing dataset table)
    • validating_dataset_merger.ipynb (notebook that visualizes and creates the validating dataset table)
    • data_training: (folder with pre-processing notebooks, 2019 raw-data .csv files, and the created training dataset table)
    • data_testing: (folder with pre-processing notebooks, 2020 raw-data .csv files, and the created testing dataset table)
    • data_validating: (folder with pre-processing notebooks, 2018 raw-data .csv files, and the created validating dataset table)
    • config_builder: (folder containing the YAML simulation config files)
      • wohnblock_household_simulation_adaptive.yaml
      • wohnblock_household_simulation_adaptive_30.yaml
    • environment_optimized_output: (folder where to store the VPP optimized simulation data results)
      • VPP_table.csv (last VPP optimized simulation data results)
    • images: (folder with plots of the best results obtained)
    • algorithms_results: (folder with algorithm evaluation notebook and plots)
      • Algorithms_results_plot.ipynb (notebook that plots Algorithms performances)
      • algorithms_results_table: (folder containing algorithms sweep results tables downloaded from wandb.ai and VPP Experiments based on EVs arrivals)
      • algorithms_graphs: (folder containing algorithms results graphs)
    • wandb: (folder with Weights&Biases training data stored)
      • tensorboard_log: (folder where training tensorboard log files are stored)

Different RL algorithms performance testing

VPP environment and Datasets debug with random-simulation

For debugging purposes and to cross-check datasets loading and Algorithm actual performances, the 1-Random_VPP_simulator.ipynb notebook is provided to run random simulations without any RL model choosing actions.

Load different Elvis simulation config set and run experiments

In the data/config_builder/ folder you can find the Elvis YAML config files.

  • Create a new config file or modify the existing ones parameters to change the Vehicle arrival simulation characteristics. You can modify [Not possible ATM]:
    • num_charging_events (number of EVs arrival, weekly)
    • mean_park (mean parking time, hours)
    • std_deviation_park (standard deviation parking time, hours)
    • mean_soc (mean State Of Charge of EVs at arrival, from 0 to 1)
    • std_deviation_soc (standard deviation State Of Charge of EVs at arrival)
  • Open the VPP simulation notebook you wish to test (as explained in previous section).
  • In the "Load ELVIS YAML config file" section, load the config file you wish. Choose among the available config files by modifying the case string to:
    • wohnblock_household_simulation_adaptive.yaml (loaded by default, 20 EVs arrivals per week with 50% av.battery)
    • wohnblock_household_simulation_adaptive_18.yaml (18 EVs arrivals per week with 40% av.battery)
    • wohnblock_household_simulation_adaptive_22.yaml (22 EVs arrivals per week with 55% av.battery)
    • wohnblock_household_simulation_adaptive_30.yaml (30 EVs arrivals per week with 65% av.battery)
  • Then re-run the whole notebook to test the VPP experiment performances.

You can check the experiments results for different EVs numbers (weekly arrivals) already loaded in the folder EV_experiment_notebooks. Direct access notebooks links:

Input datasets visualization (training, testing, validating)

Raw datasets pre-processing (training, testing, validating)

Weights&Biases account login

If you wish to train or to tune some algorithms (explained in the next sections) create a Wandb (Weights&Biases) account at https://wandb.ai/ to keep track of the experiments. The Colab notebook will automatically sign-in and save experiments results in your account storage. If the notebook asks you to sign in at the wandb.login(relogin=True) command, follow the instructions in the cell (open your wandb access code page and copy-paste the code in the cell blank space).

Model training

You can train your own model with your Hyperparameters set.

Algorithm Hyperparameter tuning

You can launch an Hyperparameters sweep session for a selected algorithm.

Algorithm and Experiments results graphs

Plot the Hyperparameters sweep results and the algorithm performances obtained and stored in the data/algorithms_results/algorithms_results_table in the notebook:

Plot the VPP tuning experiments results (for testing and validating datatsets) based on EVs arrival, from the data/algorithms_results/algorithms_results_table in the notebook:

The tables are extracted from the wandb.ai Sweep page for each Algorithm. Check out the 2D and 3D graphs already loaded.

alt text

Enjoy the material!

Have fun training, tuning and testing the RL algorithms with interactive graphs while understanding how a Virtual Power Plant works.

Authors

Francesco Maldonato - Personal contact: francesco.maldonato97@gmail.com - DAI-Labor contact: Francesco.Maldonato@dai-labor.de

Acknowledgments