cylondata/cylon

benchmark: benchmarking cylon

laszewsk opened this issue · 2 comments

How do we run a repeatable benchmark on different machines using cylon?

We should address this:

Cylon has many function, we need to test them
Cylon can be run on multiple machines, can number of nodes be variable in benchmark
Is cylon also using threads, should they be benchmarked to?
The input data can be of various size, can that be part of the benchmark?

I think it would be appropriate to have a standrad benchmark so we can run all over the place by various people on various platforms.

Does this exist?

This is already there @laszewsk. We can reuse the experiments in cylon_experiments https://github.com/cylondata/cylon_experiments

One small change we need to do is, add a data set creation logic for weak scaling

Unfortunately the README is empty, so if you can develop one, I can run more experiments on summit