The Coflow-Benchmark project aims to provide realistic workloads synthesized from real-world data-intensive applications for developing coflow-based solutions.
Currently, Coflow-Benchmark contains the following single-wave, single-stage coflow trace(s).
- The original trace is from a 3000-machine 150-rack MapReduce cluster at Facebook with 10:1 oversubscription ratio (circa 2010).
- The synthesized one-hour trace contains 526 coflows that are scaled down to a 150-port fabric (i.e., to the rack-level) with exact inter-arrival times.
- All mappers in the same rack are combined into one rack-level mapper, and all reducers in the same rack are combined into one rack-level reducer.
- Rack-level communication patterns (i.e., coflow structures) are accurately captured and the amounts of data being shuffled (e.g., coflow size) are accurate to the nearest megabyte.
Line 1: <Number of ports in the fabric> <Number of coflows below (one per line)>
Line i: <Coflow ID> <Arrival time (ms)> <Number of mappers> <Location of map-m> <Number of reducers> <Location of reduce-r:Shuffle megabytes of reduce-r>
- Simulation: The CoflowSim projects takes Coflow-Benchmark traces as input through the
class. - Deployment: Support for using Coflow-Benchmark in conjunction with Varys and Aalo (systems that schedule coflows in large clusters) are forthcoming.
Please contribute new traces from your workload along with a short paragraph on details as pull requests to make Coflow-Benchmark more diverse.
Please refer to/cite the following papers to know more about coflows, coflow scheduling, or just more details on the original traces these traces were synthesized from.
- Efficient Coflow Scheduling Without Prior Knowledge, Mosharaf Chowdhury, Ion Stoica, ACM SIGCOMM, 2015.
- Efficient Coflow Scheduling with Varys, Mosharaf Chowdhury, Yuan Zhong, Ion Stoica, ACM SIGCOMM, 2014.