m4rs-mt/ILGPU

Triton like optimisations?

fuadAbdallah opened this issue · 3 comments

First of all, this is a great project and I highly admire the results you achieved!

I stumbled over a project called Triton from openAI which is performing some interesting optimisations on an intermediate function representation and achieves very good performance compared even to expert crafted code.
I hope this idea isn't to obvious and you are already working on this but would such optimisation passes be possible with ILGPU?

MoFtZ commented

hi @imabf, I'm assuming you mean this project:
https://github.com/openai/triton

ILGPU tries to have as few dependencies as possible. It is able to run with just the .NET runtime and the Cuda drivers. The Cuda SDK is optional, unless you want specific Cuda functionality that is only available from that SDK.

It looks like Triton is its own intermediate language, as well as a compiler. I'm not sure we would want to have that kind of dependency, particularly since the end user would also be required to install Python, separately.

Having said that, Triton sounds like an interesting project. Thanks for sharing.

Ah - you are of course right. I was more thinking about looking into the optimisations that Triton performs on the IL, not actually using the code.
I have just started to working with ILGPU so please excuse if I'm speculating too much but maybe adding a hook for optimisations in the compilation process of ILGPU could be good starting point?

Hi @imabf, thanks for reaching out! I think Triton-like optimizations make a lot of sense to me. The current roadmap for our project involves developing a new SIMD CPU backend for the next version of our software. This will allow us to take advantage of the parallel processing capabilities of modern CPUs, improving performance and speed. Once the SIMD CPU backend is complete, our next step will be to implement a new SPIR-V backend. This will enable us to target AMD and Intel CPUs using a modern API, providing compatibility with a wider range of hardware and further increasing performance. After we have completed these new backend developments, our focus will shift to more high-level optimizations and computational graphs. This will enable us to optimize user-given compute graphs/codes at a higher level, improving their overall efficiency and effectiveness.

Overall, your proposal perfectly aligns with our ideas of building exciting capabilities of ILGPU, and we look forward to seeing these developments come to live in the near future!