kaushikcfd/feinsum

Remove dependency on OpenTuner.

Opened this issue · 4 comments

We should implement our own search-space exploration techniques.

I've been using task parallelism with Charm/MPI to brute force exploration on the supercomputers. Thilina has suggested reinforcement learning.

I personally don't hate the search exploration technique in Opentuner (https://dl.acm.org/doi/abs/10.1145/1830483.1830619), but it's a bit of an overkill for our use case and I thought we could just implement the above paper by hand in feinsum itself.

I've been using task parallelism with Charm/MPI to brute force exploration on the supercomputers

Nice! We maintain the same sqlite file for populating the tuning results. I wonder if we could just launch a multi-rank run without any changes on the feinsum end 🤔.

I think it should be possible. The parallel tuner currently just needs the untransformed kernel and a list of tuples of transformation strings and it will return the trials results with some performance data. It should be fairly straightforward to add the result(s) to the database.