The Mobility House Coding Challenge

This README explains all steps required to collect, process, and query information about curtailment of power generation in the balance zone of the transmission system operator (TSO) TenneT GmbH in Germany. Data ranges from 1st of January 2022 to 31st of December 2022.

Explanations
- Dataset
- Database
Setting Up The Environment
Directory Structure To Run Example
Example Code
Open End Question

Explanations

The task is to extract data from two different data sources (1. Abgeschlossene Maßnahmen, 2. EEG-Analgenstammdaten), merge them with each other based on their "EEG Anlagenschlüssel" and lastly, calculate the curtailed power and energy.

Go to top of README

Dataset

However, none of those two datasets contain information about the nominal power of each power plant. Additionally, only the first dataset dates back to 2020 or 2021. The provided link for the 2nd dataset only points to data for 2022. I did not receive a response from the website provider to my request about how to access the data from 2020 or 2021. Therefore, I test my solution with data from 2022.

I scrapped the Marktstammdatenregister to get the nominal power of each power plant. The 2nd dataset contains information about the Marktstammdaten number for NBs (NB_Mastr_Nr). Filtering with this number the Marktstammdatenregister database returns a lot of information of the installed power plants in the balance zone of that specific network operator, including the nominal power.

The data should also focus on the balance zone of TenneT GmbH. However, the provided example request to obtain dataset 1 via the API uses the network operator Avacon. This network operator operates also outside of TenneTs balance zone (e.g., Nordrhein-Westfalen). Nevertheless, to reduce complexity I stick to this network operator.

Database

The goal is to give all users working on the database as few priviliges as necessary. For example, the client-side plotting the data only needs read access. For the server-side I first limited its privileges to read (SELECT) and write (INSERT). However, I had througput and latency issues with updating the database. As a work around, I therefore gave the server also delete (TRUNCATE) rights to replace the entire database instead of updating only a few columns. This needs further improvements for production systems to avoid false reads from clients while working with the database. I tried the following approaches to update the table, but all of them required multiple minutes of execution time:

Updating and commiting in a for-loop
Updating in a for-loop and only commit every 10,000th entry
Use executemany and execute_batch
Updating by using integer index for the WHERE condition instead of the plant_id column, which is VARCHAR (e.g., UPDATE "curtailments" SET "power_curtailed"=%s WHERE "idx"=%)

Further improvments include using smaller data types for the columns where applicable. For example, SMALLINT for power_nominal instead of NUMERIC. However, this requires checks before writing anything to the database to avoid errors.

Go to top of README

Setting Up The Environment

Operating system: Ubuntu 22.04 LTS

Deploy Virtual Machine

Check if you have Terraform installed by running terraform --help in the terminal. If Terraform is not installed on your machine follow the install guide from Hashicorp. We provide an example Terraform configuration files to deply a virtual machine running on OpenNebula. Adjust the following parts to match your environment:

config.tfvars:

ON_USERNAME
ON_PASSWD
ON_GROUP

infra.tf:

endpoint
flowpoint

terraform init
terraform plan -var-file=config.tfvars -out=infra.out
terraform apply infra.out
tearraform show to get IP address of deployed virtual machine, which is requried for the next step
terraform destroy -var-file=config.tfvars (optional)

Go to top of README

Initialize Virtual Machine

Check if you have Ansible installed by running ansible --help in the terminal. If Ansible is not installed on your machine follow the install guide from Ansible Documentation. We install all required programs and packages on the previously deployed virtual machine via Ansible playbooks.

Define IP address(es) by using a host group (e.g., in /etc/ansible/hosts) and refer to it in the playbooks
Adjust host group in all following YAML files
Update virtual machine: ansible-playbook udpate.yml
Install PostgreSQL: ansible-playbook postgres_install.yml
Copy files to virtual machine: ansible-playbook copy_files.yml --extra-vars '{"path_to_files": "path/to/files/with/trailing/backslash/"}'
Install Python virtual environments for type= server or client: ansible-playbook prepare_virtualenv.yml --extra-vars '{"type": "server"}'

Go to top of README

Configure Database

Either install PostgreSQL database on the machine with the above Ansible playbooks or install manually with this installation guide.

Set up a PostgreSQL database with two users. First, connect to PostgreSQL sudo -u postgres psql and list all available users with their respective priviliges \du / SELECT * FROM information_schema.role_table_grants WHERE grantee='user_name'; or without their priviliges SELECT usename from pg_catalog.pg_user;. Create new users if no suitable users already exist for the server and client CREATE USER tmh_<type> WITH ENCRYPTED PASSWORD 'tmh_<type>';:

type=server: User to read and write data into an existing database (here: CREATE USER tmh_server WITH ENCRYPTED PASSWORD 'tmh_server';)
type=client: User with read-only access (here: CREATE USER tmh_client WITH ENCRYPTED PASSWORD 'tmh_client';)

Preparing database and tables:

Create database SELECT 'CREATE DATABASE curtailment_tennet' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'curtailment_tennet')\gexec
Connect to new database \c curtailment_tennet
Create table: CREATE TABLE IF NOT EXISTS curtailments (start_curtailment TIMESTAMP, end_curtailment TIMESTAMP, duration SMALLINT, level SMALLINT, cause VARCHAR, plant_id VARCHAR, operator VARCHAR, power_nominal numeric, power_curtailed numeric, energy_curtailed numeric);
Change user priviliges:
4.1 Give server and client read access GRANT SELECT ON TABLE curtailments TO tmh_<type>;
4.2 Give server write access GRANT INSERT ON TABLE curtailments TO tmh_server;
4.3 Give server truncate access GRANT TRUNCATE ON TABLE curtailments TO tmh_server;

Go to top of README

Python Virtual Environment

Check out the above instructions on how to start a new virtual machine with Terraform and to initialize it with Ansible. Otherwise run the following commands to install everything needed to create a new Python virtual environment:

sudo apt update && sudo apt upgrade -y
sudo apt install python3-venv python3-pip -y
python3 -m venv name_of_venv
. name_of_venv/bin/activate
pip install wheel && pip install --upgrade pip
pip install git+https://github.com/Rene36/tmh_server

Go to top of README

Directory Structure To Run Example

Go to top of README

Example Code

# stdlib
import os
import logging

# third party
from tmh_server.mapping import Mapping
from tmh_server.avacon_api import AvaconAPI
from tmh_server.postgresql import PostgreSQL
from tmh_server.process_data import ProcessData
from tmh_server.mastr_scrapper import MastrScrapper


SCRAP_MASTR: bool = False

# Initialize logger
logger = logging.getLogger(__name__)
logging.basicConfig(filename=os.path.join(os.getcwd(), "tmh.log"),
                    format="%(asctime)s,%(levelname)s,%(module)s:%(funcName)s:%(lineno)s,%(message)s",
                    datefmt="%Y-%m-%d %H:%M:%S",
                    level=logging.DEBUG)


def main():
    """
    Combines data extraction from Avacon API, cleaning the output,
    and storing it in a PostgreSQL database. Then scrap
    Marktstammdatenregister based on network operator ID to obtain
    mapping of EEG Anlagenschlüssel to nominal power (might be optional).
    Lastly, update values for nominal power and curtailed energy based
    on power plant ID.
    """
    # Extract information from Avacon API based on avacon_api.json config
    config_path: str = os.path.join(os.path.abspath(os.path.dirname(__file__)),
                                    "configs")
    avacon_api: AvaconAPI = AvaconAPI(config_path=config_path,
                                      config_name="avacon_api.json")
    df = avacon_api.call_api()

    # Clean extracted data
    process_data: ProcessData = ProcessData(df)
    process_data.clean()
    df = process_data.get_data()

    # Store cleaned data in PostgreSQL database based on query.json config
    psql: PostgreSQL = PostgreSQL(config_path=config_path,
                                  config_name="query.json",
                                  data=df)
    psql.connect_and_insert()

    # Scrap Marktstammdatenregister (might be optional)
    path_anlagenstammdaten: str = os.path.join(os.path.abspath(os.path.dirname(__file__)),
                                               "anlagenstammdaten")
    mapper: Mapping = Mapping(path_anlagenstammdaten,
                              "TenneT TSO GmbH EEG-Zahlungen Bewegungsdaten 2022.csv")
    if SCRAP_MASTR:
        scrapper: MastrScrapper = MastrScrapper()
        for nb_mastr_nr in mapper.get_nb_mastr_nrs():
            scrapper.set_nb_mastr_nr(nb_mastr_nr)
            scrapper.download_via_link()
            scrapper.move_downloaded_file()
        scrapper.merge_snbs_into_one_csv(path_anlagenstammdaten)

    # Update PostgreSQL database with values for nominal power and curtailed energy
    psql.connect_to_db()
    mapper.set_df(psql.get_rows("curtailments"))
    mapper.get_merged_snbs()
    mapper.create_mapping()
    mapper.map_power_to_plant_id()
    mapper.calculate_curtailed_power()
    mapper.calculate_curtailed_energy()

    psql.cur.execute("TRUNCATE curtailments;", )
    psql.connection.commit()
    psql.set_df(mapper.df_db)
    psql.close_connection()
    psql.connect_and_insert()


if __name__ == "__main__":
    main()

Go to top of README

Open End Question

What can we do with the information about curtailed power and energy of specific power plants in a given region:

Identify patterns in time, type of power plant and location
1.1 Time: Is wind power mostly curtailed at night or in the winter? Are PV fields curtailed at peak sun?
1.2 Type of power: Are wind power plants curtailed more often than PV fields due to their large nominal power? It might be more efficient to shutdown one wind power plant with 5 MW instead of a bunch of PVs.
1.3 Location: Identify bottlenecks in the grid by using power plants, which are more often curtailed then others of the same type. For example, I scrapped information about the postal code as well allowing to visualize the type of curtailments over time on a map.
Enriching our dataset with weather data (e.g., sun irradation, temperature) to better understand the relation between curtailments and weather. For example, high irradation can lead to a higher chance of being curtailed for PV fields.
Enriching our dataset with electricity consumption to better understand the relations between curtailments and consumption. For example, peak-demand times (e.g., in the morning and evening or during large events such as festivals) are less prone for curtailments.
Trying to predict curtailed power and energy with linear regression and machine learning (ML) models by using weather, consumption, and information about the power grid as features / inputs. One could develop physically-informed ML models to consider grid transmission constraints.
4.1 LSTM: Past values of the weather and curtailments impact the behavior of future values. Therefore, a ML model such as an LSTM might increase prediction performs due to considering the past
4.2 Lasso regression: Linear regression models are powerful and especially computation-efficient. Before using ML techniques one should test simpler prediction models first. For example, the Lasso regression model outperformed a CNN in my recent publication
Look abroad: Energy is bought where it is cheap. When France or Norway produce cheap electricity with their nuclear and hydro power plants, respectively, local power plants might be more prone for curtailments. This works both ways. For example, last year during summer France had to reduce the output of their nuclear power plants due to too warm river water and Norway produced less with their hydro power due to too low reservoir levels.

Go to top of README

Rene36/tmh_server

The Mobility House Coding Challenge

Explanations

Dataset

Database

Setting Up The Environment

Deploy Virtual Machine

Initialize Virtual Machine

Configure Database

Python Virtual Environment

Directory Structure To Run Example

Example Code

Open End Question