/coord-sim

Lightweight flow-level simulator for inter-node network and service coordination (e.g., in cloud/edge computing or NFV).

Primary LanguagePython

Build Status

Simulation: Inter-node service coordination and flow scheduling

Simulate flow-level, inter-node network coordination including scaling and placement of services and scheduling/balancing traffic between them.

Features:

  • Simulate any given network topology with arbitrary node and link capacities and link delays
  • Simulate any given network service consisting of linearly chained SFs/VNFs
  • VNFs can specify arbitrary resource consumption as function of their load using Python modules. Also VNF delay can be specified individually and may be normally distributed.
  • Simulate network traffic in the form of flow arrivals at various ingress nodes with varying arrival rate, flow length, volume, etc according to stochastic distributions
  • Simple and clear interface to run algorithms for scaling, placement, and scheduling/load balancing of these incoming flows across the nodes in the network. Coordination within each node is out of scope.
  • Interface allows easy integration with OpenAI Gym to enable training and evaluating reinforcement learning algorithms
  • Collection of metrics like successful/dropped flows, end-to-end delay, resource consumption, etc over time. Easily extensible.
  • Discrete event simulation to evaluate coordination over time with SimPy
  • Graceful adjustment of placements: When VNFs are removed from a placement by an algorithm. Currently processing flows are allowed to finish processing before the VNF is completely removed (see PR #78 and #81).

Setup

Requires Python 3.6. Install with (ideally using virtualenv):

pip install -r requirements.txt

Usage

Type coord-sim -h for help using the simulator. For now, this should print

$ coord-sim -h
usage: coord-sim [-h] -d DURATION -sf SF [-sfr SFR] -n NETWORK -c CONFIG
                 [-t TRACE] [-s SEED]

Coordination-Simulation tool

optional arguments:
  -h, --help            show this help message and exit
  -d DURATION, --duration DURATION
                        The duration of the simulation (simulates
                        milliseconds).
  -sf SF, --sf SF       VNF file which contains the SFCs and their respective
                        SFs and their properties.
  -sfr SFR, --sfr SFR   Path which contains the SF resource consumption
                        functions.
  -n NETWORK, --network NETWORK
                        The GraphML network file that specifies the nodes and
                        edges of the network.
  -c CONFIG, --config CONFIG
                        Path to the simulator config file.
  -t TRACE, --trace TRACE
                        Provide a CSV trace file to configure the traffic the
                        simulator is generating.
  -s SEED, --seed SEED  Random seed

You can use the following command as an example (run from the root project folder)

coord-sim -d 20 -n params/networks/triangle.graphml -sf params/services/abc.yaml -sfr params/services/resource_functions -c params/config/sim_config.yaml

This will run a simulation on a provided GraphML network file and a YAML placement file for a duration of 20 timesteps.

Dynamic SF resource consumption

By default, all SFs have a node resource consumption, which exactly equals the aggregated traffic that they have to handle.

It is possible to specify arbitrary other resource consumption models simply by implementing a python module with a function resource_function(load) (see examples here).

To use these modules, they need to be referenced in the service file:

sf_list:
    a:
      processing_delay_mean: 5.0
      processing_delay_stdev: 0.0
      resource_function_id: A

And the path to the folder with the Python modules needs to be passed via the -sfr argument.

See PR RealVNF#78 for details.

Egress nodes

  • A node can be set to be a Egress node in the NodeType attribute of the network file
  • If some nodes are set as Egress then only the simulator will randomly choose one of them as the Egress node for each flow in the network
  • If some nodes are set to be Egress then once the flow is processed we check if for the flow, current node == egress node . If Yes then we depart , otherwise we forward the flow to the egress_node using the shortest_path routing.
  • Todo: Ideally the coordination algorithms should keep the path(Ingress to Egress) of the flow in view while creating the schedule/placement.

See PR 137 for details.

Conversion of real world traffic traces

Real World traffic traces are available at sndlib under 'Dynamic traffic' at the left. They contain the data rate for every pair of node in a network for every 5 minutes for a timespan of six months. Available data formats are xml and another "native sndlib format". For usage in the simulator this data has to be converted into inter_arrival_mean. A script for that (which works with the xml files) you find here coord-sim/params/convert_traces/convert_traces.py. In the same folder you also find an example configuration for the script and an example data set for the first try.

coord-sim/params/convert_traces$ tree
.
├── abilene_node_name_map.yaml
├── convert_traces.py
├── directed-abilene-zhang-5min-over-6months-ALL
│   ├── demandMatrix-abilene-zhang-5min-20040302-1830.xml
│   ├── demandMatrix-abilene-zhang-5min-20040305-0150.xml
│   ├── demandMatrix-abilene-zhang-5min-20040411-0520.xml
│   ├── demandMatrix-abilene-zhang-5min-20040626-0345.xml
│   ├── demandMatrix-abilene-zhang-5min-20040630-2150.xml
│   ├── demandMatrix-abilene-zhang-5min-20040704-2020.xml
│   ├── demandMatrix-abilene-zhang-5min-20040808-0140.xml
│   ├── demandMatrix-abilene-zhang-5min-20040812-1415.xml
│   ├── demandMatrix-abilene-zhang-5min-20040819-2305.xml
│   └── demandMatrix-abilene-zhang-5min-20040907-0905.xml
└── trace_xml_reader_config.yaml

The folder directed-abilene-zhang-5min-over-6months-ALL contains 10 xml files from sndlib each standing for traffic in one 5min timespan. The configuration you find in trace_xml_reader_config.yaml. It contains:

source: "directed-abilene-zhang-5min-over-6months-ALL"
# result_trace_filename: <>  # default  = f'{directory}_{_from}-{to}_trace.csv'
# intermediate_result_filename: <>  # default  = result_trace_filename + "_intermediate
# _from: 0 # default 0
# to: 100  # default None, means slice is [_from:]
node_name_map: abilene_node_name_map.yaml  # default None, means leave the names
run_duration: 100  # default 100
scale_factor: 0.001  # default 0.001
change_rate: 2  # default 2
#ingress_nodes:  # default None, means choose all nodes
#  - pop0
#  - pop1

Parameter source points to the folder with the xml files. Execute:

coord-sim/params/convert_traces$ python3 convert_traces.py --config_file trace_xml_reader_config.yaml
[...]
23:20:54: 10  files in directory
23:20:54: Chosen files: os.listdir(directed-abilene-zhang-5min-over-6months-ALL)[0:]
[...]
23:21:00: Written to directed-abilene-zhang-5min-over-6months-ALL_0-None_trace.csv. Last time step 1800
[...]
23:21:00: inter_arrival_mean range: 0.323815912931867, 36.447214465141315
23:21:00: ... mean:  7.381947818616778
23:21:00: ... median:  5.646327364198291
23:21:00: ... std:  5.661472126922231

The converted trace is written to directed-abilene-zhang-5min-over-6months-ALL_0-None_trace.csv. Reading the files takes most of the time. That's why the script writes some intermediate data to another csv file (in this case it is named directed-abilene-zhang-5min-over-6months-ALL_0-None_intermediate.csv). You can reuse it with different parameters by setting the source parameter to the filename of the intermediate. For example we want to include not all ingress nodes: trace_xml_reader_config.yaml:

source: directed-abilene-zhang-5min-over-6months-ALL_0-None_intermediate.csv
result_trace_filename: ing_pop0_pop1.csv
[...]
ingress_nodes:  # default None, means choose all nodes
  - pop0
  - pop1

We also give the resulting trace_file another filename to avoid overwriting:

coord-sim/params/convert_traces$ python3 convert_traces.py --config_file trace_xml_reader_config.yaml

The script will work on the data from the intermediate file. By default filenames are constructed from the directory the arguments _from and to. You also can assign filenames to to the intermediate and the results file: trace_xml_reader_config.yaml:

source: "directed-abilene-zhang-5min-over-6months-ALL"
intermediate_result_filename: directed-abilene-zhang-5min-over-6months-ALL_0-None_intermediate.csv
result_trace_filename: ing_pop0_pop1.csv
[...]

Since a batch from sndlib contains so many files you can choose a sample of them with arguments _from and to, which defines a slice. The script calls: os.listdir(source)[_from:to] if source is a directory. That way you can limit the number of files to read. If source is set to an intermediate file it will be also sliced according to those parameters.
The node names in our network files differ from those in sndlib. To change them a yaml file is assigned. In the above config example parameter node_name_map was set to abilene_node_name_map.yaml, which looks like this:

# defines how to rename nodes (from keys to values). If a node is set to null it will be removed from the
# dataframe. If a node is not mentioned in the yaml it will be ignored, the name will be kept.
ATLAM5: null  # this node is removed and does not appear even in the intermediate
ATLAng: pop9  # renamed from ATLAng to pop9
CHINng: pop1
DNVRng: pop6
HSTNng: pop8
IPLSng: pop10
KSCYng: pop7
LOSAng: pop5
NYCMng: pop0
SNVAng: pop4
STTLng: pop3
WASHng: pop2

Save plots of the data_rate or the inter_arrival_mean:

coord-sim/params/convert_traces$ python3 convert_traces.py --config_file trace_xml_reader_config.yaml --save_plots data_rate inter_arrival_mean
coord-sim/params/convert_traces$ tree
.
├── abilene_node_name_map.yaml
├── convert_traces.py
├── directed-abilene-zhang-5min-over-6months-ALL
│   ├── demandMatrix-abilene-zhang-5min-20040302-1830.xml
│   ├── [...]
│   └── demandMatrix-abilene-zhang-5min-20040907-0905.xml
├── directed-abilene-zhang-5min-over-6months-ALL_0-None_intermediate.csv
├── directed-abilene-zhang-5min-over-6months-ALL_0-None_trace.csv
├── directed-abilene-zhang-5min-over-6months-ALL_0-None_trace_data_rate.png            <---
├── directed-abilene-zhang-5min-over-6months-ALL_0-None_trace_inter_arrival_mean.png   <---
├── directed-abilene-zhang-5min-over-6months-ALL_0-None_trace_meta.yaml
└── trace_xml_reader_config.yaml

Save plots as pdf:

coord-sim/params/convert_traces$ python3 convert_traces.py --config_file trace_xml_reader_config.yaml --save_plots data_rate inter_arrival_mean --plot_format pdf

Show plots in the end of the script by calling plt.show():

coord-sim/params/convert_traces$ python3 convert_traces.py --config_file trace_xml_reader_config.yaml --plot data_rate inter_arrival_mean

For more information look at the doctrings in the script or the comments in the example config.

Overall abilene intermediate file

We have an intermediate csv-file overall_abilene_intermediate.csv, which contains the whole abilene batch from sndlib with 48 thousand time step. It is recommended to use it for producing traces for the abilenme network by setting the source parameter in config to it.

Create Episode Animations

Another way to analyse results is to create animation, which shows the placement in a single test episode changing over time. Use command animation for that:

Create animation from the first test directory from a results directory: animation --results_dir <> To show all available test directories in a results directory: animation --results_dir <> --show_tests
Create animation from test directory: animation --test_dir <>
Show animation in the end (by calling plt.show()): animation --test_dir <> --show
Save animation as html video: animation --test_dir <> --save <possible values: html, git, both>
You can limit the amount of data to process by setting --sample_rate: animation --test_dir <> --sample_rate 5 Default is 1 Thus for example every fifth point in time will be taken into the animation. Resolution will be worse, of course. But it is useful if the scenario is to big Set interval between frames: animation --test_dir <> --interval 100 Default is 100

Create an animation not from cli, but as python code with the PlacementAnime class:

pa = PlacementAnime(test_dir)
pa.create_animation()
pa.animation # animation object
pa.fig # figure object
pa.ax # Axis object (network, placement etc)
pa.ingress_traffic_ax # Axis object (ingress_traffic)

Maybe you will need the tkinter module installed for that: sudo apt install python3-tk

LSTM Traffic Prediction

The simulator has an LSTM module to predict traffic based on the traffic traces mentioned above.

The LSTM module must be trained separately, to do so, lstm_prediction must be enabled in the simulator config file. Additionally, lstm_weights must be set to the directory where the desired location to save the weights within the current working directory. This will also be used during the running of the simulator to load the weights.

To train the LSTM nerual network:

Use the lstm-predict module similar to the example below. The module takes one argument — the simulator config file that will be used during the actual simulation.

lstm-predict -c <PATH-TO-CONFIG-FILE>

This will train the LSTM network based on the trace specified in the trace_path in the config file, then save the weights to the weights directory specified in the config file.

Afterwards, use the same configuration file when using the simulator to make sure that weights are correctly loaded in the simulator later on.

Tests

# style check
flake8 src

# tests
nose2

Acknowledgement

This project has received funding from German Federal Ministry of Education and Research (BMBF) through Software Campus grant 01IS17046 (RealVNF).