Real-World Benchmarks
LilithSilver opened this issue · 1 comments
I'd like to have some sort of real-world benchmarks. Currently, we just have some micro-benchmarks of varying quality.
Here's some research on the topic from more general schedulers...
-
In An Efficient Work-Stealing Scheduler for Task Dependency Graph (Lin et al, 2020) they use two approaches...
- Micro-benchmarks: a series of tests (similar to what we're doing already) given standard algorithms. Implementing these would be reasonably easy. Here are the chosen programs from the paper:
- Linear chain: Each task has one successor and one predecessor except the first task and the last tasks. Each task increments a global counter by one. The graph has 8388608 tasks.
- Binary tree: Each task has one predecessor and two successors except the root and leaf tasks. The task has no computation and the graph has 8388607 tasks.
- Graph traversal: We generate a directed acyclic graph with random dependency. The degree of each node, i.e. number of successors and predecessors, is bounded by four. Each task marks a boolean variable to true indicating the node is visited. The graph has 4000000 tasks.
- Matrix multiplication: We generate a task graph to multiply two square matrices of size 2048×2048. The task graph has two parts: (1) The first part initializes the values of matrices. (2) The second part performs the matrix multiplication. Computations are performed row by row to represent a generic parallel-for pattern
- Real-world tests using VLSI timing analysis from OpenTimer. See the paper for details. I think this probably isn't for us, but it's an interesting approach worth mentioning.
- Micro-benchmarks: a series of tests (similar to what we're doing already) given standard algorithms. Implementing these would be reasonably easy. Here are the chosen programs from the paper:
-
If someone wanted to take up the work, deciphering Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance (Slaughter et al, 2020) and creating an implementation in this scheduler could be a fun (or maybe not so fun) challenge. I don't have the knowledge or experience to do it properly, though.
- Quantifying Overheads in Charm++ and HPX Using Task Bench (Wu et al, 2022) discusses an implementation in Charm++ and HPX that may be of use
- The Task Bench repository has a ton of example implementations; mostly in C though.
There is an issue with these research approaches: none of them are focused around what this job scheduler is really for (for use in Arch and more generally in games and game-like applications). I'm not sure how to get some ECS-focused benchmarks up and running.
There's also the subject of comparisons across other solutions for C#. For example, it would be good to have some benchmarks that we could run against Unity's job system, Unreal's Tasks System, Quartz.NET, or maybe even C#'s default Task library to see how our overhead compares and where we might improve.
By the way, I like your scientific approach! Unfortunately, there are hardly any people who rely on paper, most people just do it with their gut feeling :D
Real world benchmarks would of course be important. But unfortunately also an incredible amount of effort. The micro benchmarks (at least some) would probably be the best variant. For ECS benchmarking we could come up with a scenario. The most realistic would be something like position calculation, pathfinding, collision checks or similar... best scenarios where each entity is addressed by itself and there are hardly any cross-references.
For the beginning, we should maybe implement the benchmark for this framework only. Unity and co can follow later.