Generator to produce coflow traces with varying characteristics using several parameters, as discussed below.
The following list of coflow properties can be varied using the workload generator:
- Coflow size -- Sum of sizes of constituent flows
- Coflow width -- Minimum number of sources/destinations used by the flows of a coflow
- Incast Ratio -- Ratio of number of destinations to the number of sources of constituent flows
- Skew -- Ratio fo maximum and minimum flow sizes of a coflow
- Inter-arrival Time -- Time between arrival of two successive coflows, related to the load on the network.
The generator uses the following parameters which can affect the properties shown alongside.
Input Parameter | Description | Affected Coflow Property |
---|---|---|
N | Number of coflows | Scale of the trace |
a | Max coflow width | Width, Size, Skew and Inter-arrival Time |
c | Intra-coflow contention | Width, Size, Skew and Inter-arrival Time (Mild) |
D_1 | Source Frequency Distribution | Width |
D_2 | Incast Ratio Distribution | Incast-ratio and Width |
D_3 | Total Destination Data Distribution | Size and Skew |
D_4 | Intra-Destination Data Distribution | Skew |
L | Network Load | Inter-arrival Time |
BW | Access Link Bandwidth | Inter-arrival Time |
P | Total Number of Sources/Destinations | Width, Inter-arrival Time |
The generator can sample various parameters of a publicly available coflow trace , which was obtained from running jobs on a cluster in Facebook, and use them to upscale various other properties like trace size, network load, etc.
The values following parameters (variable used in generator) should be manually entered before running the generator: N (NUM_COFLOWS
), P (NUM_INP_PORTS
) and BW (ACCESS_LINK_BANDWIDTH
).
The parameters a(ALPHA
), c(INTRA_COFLOW_CONTENTION
), D_1 (SOURCE_NUM_DIST
), D_4 (DESTINATION_DATA_DIST
) and L (LOAD_FACTOR
) are specified during runtime as command line arguments.
The parameters D_2 and D_4 are sampled from the Facebook trace and are stored using variables hist_dist_ir
and hist_dist_dd
in the generator, and can be easily extended to allow for any arbitrary distribution.
The generator runs without errors with python >= 2.7.14, numpy >= 1.14.1, scipy >= 1.0.0 and matplotlib >= 2.1.2.
To run the generator, use command
python trace_producer.py [NUM_COFLOWS] [ALPHA] [LOAD_FACTOR] [INTRA_COFLOW_CONTENTION] [SOURCE_NUM_DIST] [DESTINATION_DATA_DIST]
The following range of values can be used for the command line arguments
Command Line Argument | Input Value Range |
---|---|
NUM_COFLOWS |
int value > 0 |
ALPHA |
int value in {1,...,P} |
LOAD_FACTOR |
float value in (0,1) |
INTRA_COFLOW_CONTENTION |
float value in [0,1] |
SOURCE_NUM_DIST |
char value U (unfiform dist) or Z (zipf dist) |
DESTINATION_DATA_DIST |
char value U (unfiform dist) or Z (zipf dist) |
For e.g. to generate a trace with 2000 coflows and 0.9 network load, with each coflow having a maximum of 20 sources, having 0.5 contention and uniform source frequency and destination data distributions
python trace_producer.py 2000 0.9 20 0.5 U U
To keep the parameters same as Facebook trace and vary just the number coflows and network load, run
python trace_producer.py 2000 FB-UP 0.9
The generated trace is stored in a directory traces/. The trace has the following format
Line1: <NUM_INP_PORTS> <NUM_COFLOWS>
Line2: <Coflow 1ID> <Arrival Time (in millisec)> <Number of Flows in Coflow 1> <Number of Sources in Coflow 1> <Number of Destinations in Coflow 1> <Flow 1 Source ID> <Flow 1 Destination ID> <Flow 1 Size (in MB)> ... <Flow N Source ID> <Flow N1 Destination ID> <Flow N1 size (in MB)>
...
...
Line i+1: <Coflow iID> <Arrival Time (in millisec)> <Number of Flows in Coflow i> <Number of Sources in Coflow i> <Number of Destinations in Coflow i> <Flow 1 Source ID> <Flow 1 Destination ID> <Flow 1 Size (in MB)> ... <Flow N Source ID> <Flow Ni Destination ID> <Flow Ni size (in MB)>
...
A helper script distribution_producer.py is also provided to help generate the distributions of parameters: size, width, skew, avg-max-min network load, inter-arrival time and incast ratio for the generated traces.
To run this, use exactly the same arguments provided to generate the tracefile
python distribution_producer.py [NUM_COFLOWS] [ALPHA] [LOAD_FACTOR] [INTRA_COFLOW_CONTENTION] [SOURCE_NUM_DIST] [DESTINATION_DATA_DIST]
or
python distribution_producer.py [NUM_COFLOWS] FB-UP [LOAD_FACTOR]
A .txt file for each parameter is generated with a parameter value corresponding to each coflow in the tracefile. The .txt files can be found in respective folders named according to the parameter involved. These files can then be easily used to obtain mean, median or cdf of the required distributions.