simit-lang/simit

Exploring code generation with LLVM Polly

Opened this issue · 1 comments

Hi all,

Simit piques my interest because of its potential use in runtime-optimized numerical computing. I've been following some other projects that are interesting for the same reason, and the one I think could be useful here is the Polly project. Polly is a polyhedral code optimizer (that processes vanilla LLVM IR), which roughly speaking is particularly good at reordering dense computation kernels to significantly improve data- and temporal-locality and thus combat CPU cache-misses. For example, the Polly developers have posted an open goal of automatically tiling a naive matrix-multiplication kernel so that its runtime is competitive with hand-coded BLAS routines. They are supposedly within an order of magnitude from this goal, last I checked.

Since this tool is applicable to any affine (dense, regular) computation and operates on LLVM IR, I wanted to see if it can confer any performance benefits to Simit's CPU runtime. One feature that I am keen on trying out is the automated code parallelizer, which emits OpenMP-powered parallel code, something that I do not believe Simit has yet explored. In general, the polyhedral model is also said to be an powerful generalization of many loop transformations and possibly code vectorization, so I am also interested in whether its vectorizer can be put to work in Simit, since I believe that Simit does not explore this either.

There is also one more interesting feature of Polly, namely that work has been done to generate GPU kernels directly from LLVM IR. I believe it is still experimental, but I thought it may be interesting to explore that as an alternative GPU backend once it matures.

I haven't had much time, so my experimentation to see what Polly can do in concert with Simit's LLVM backend has been slow. I started a few weeks ago, but haven't yet formed a development infrastructure to apply and test the optimizations. One thing I do want to point out is that I am getting an error when injecting timers into the Simit code through its API (I will submit a bug report soon), so I haven't yet profiled anything concretely. I am also lacking enough examples of Simit programs to run through the optimizer to demonstrate general improvements over blind luck. Do you have on hand any examples of long-running Simit programs that are CPU-intensive?

Cheers!

Hi Chris,

It is true that Simit does not exploit polyhedral compilation or parallelism. However, not that many of Simit's loop nests are sparse, so the polyhedral model will be less effective. That being said, I'd love to hear what you learn! One thing that's definitely the case is that there are a lot of dense inner loops emitted from typical Simit programs that can benefit from vectorization!

best,
Fred