State of the MIR union
DangerMouseB opened this issue · 16 comments
Hi Vladimir,
I've been watching your repo for a while now (2 years maybe) - the project is very interesting to. However, I've held off investing much time in learning it as the front page states that it is in initial stages of development just here for the purposes of getting acquainted. There are ongoing commits away from master which makes me curious. Could you maybe give an update and say a bit about how it's going and where you're hoping to take it?
Many thx,
David
The whole development is done now on bbv branch. There are a lot activities on this branch but github does not report activities on non-main branches.
My goal is to merge bbv branch into trunk when it is ready. I am planning to do this at the summer end/begging of fall.
New insns will be added, There will be new features I found useful for writting better performing JIT-compilers: global register variables, simplified call/return MIR insns, bultin_expect. There will be support of basic block versioning and may be even meta-tracing JIT compiler support. These features are described in my blog post https://developers.redhat.com/articles/2022/02/16/code-specialization-mir-lightweight-jit-compiler#aliasing_in_a_jit_compiler.
The MIR-generator will be changed a lot. There will be no constraints to use only conventional SSA.
Optimizations for values in memory will added, especially when memory is worked with stack manner (usually used by interpreters). New optimizations like register pressure-relief, coalescing, SSA insn combining, loop-invariant motion etc will be added.
Register allocator will be significantly improved and will support live-range splitting (e.g. around loops).
All of this will be done to improve JIT code of interpreter code as for for typical C code (generated by c2m
).
So the next release will be a major one. Ambitious goals currently present on the project page will be changed. The project will be more oriented to simplify implementation of JIT compilers of static and dynamic programming languages which are currently implemented by interpretation.
hi @vnmakarov
Making MIR more focused to its use as JIT backend for languages is a good direction IMO.
I hope some C extensions can be added to help this. You mentioned built-in expect.
How about the ability to have local functions without the regular function call overhead - more like local jump and return. This will be a boon for converting interpreter VM bytecode to JIT - as typically each byte code implementation can be then executed in this way rather than having to inline them causing code bloat. LuaJIT uses this approach - the VM bytecode functions are just local jumps.
I think it is done. There is already a lighter versions of calls. BBV branch has new MIR insns (JCALL and JRET) and corresponding C builtins.
Usually interpreter is implemented through indirect goto like pc+=insn_len; goto *pc
where the first VM insn word contains address of labeled code of the interpreter responsible for execution of the corresponding VM insn. With the new MIR insns it is easy to switch to JITted code (of one or more VM insns) by changing address in the 1st VM insn word. The global context can be passed by global register variables (another MIR extension on bbv branch) also used in the interpreter as local register variables (a GCC extension).
Also by default all calls in MIR are implemented through calling thunk. It permits to change the generated code on-the-fly. When it is not necessary, on the bbv branch you can get and use the generated MIR function directly bu using a new function (_MIR_get_thunk_addr, the name of this function is not final and can be changed in the release).
I use this extensions to implement an experimental JIT for Ruby https://github.com/vnmakarov/ruby.
When testing the bbv
branch with sqlite3
I'm still getting the same problem reported here #286 .
Thank you, are those features usable from C code since I don't directly generate MIR instructions?
are those features usable from C code since I don't directly generate MIR instructions?
Yes. Btw, I did use only C too for Ruby JIT I mentioned.
Thank you for clarifying that. I've been using QBE so far - I like the simplicity of the IR (no types, it fixes up non SSA) and the fact that the ABIs are already implemented - though my end goal is JIT compilation for a hopefully highish performance functional style language with multi-methods and generics based on Smalltalk and q and I'm wondering about putting in more time to get up to speed with MIR.
It sounds like it will be worth my while judging from your comments above. If I may, a few more questions come to mind.
- Can I check that MIR handles function calls to C code (i.e. the ABI)? (and can be called from C).
- Does MIR handle both cdecl and stdcall on windows? (I'm on macos aarch64 but windows feels important).
- Is it possible to insert debug info?
- will it be possible to have a zero cost style exception handling, (i.e. zero cost on the happy path) , e.g. like the itanium one that C++ uses?
One last question, can you point me to anything that explains how to link with other libraries into JIT compiled code?
- Can I check that MIR handles function calls to C code (i.e. the ABI)? (and can be called from C).
Yes, c2m
implements the target ABIs. ABI compatible calls can be described by using only MIR but the complex cases (usually small structs/unions) are not documented yet. You still can figure it out by looking at mir code generated by c2m
using -S
option. The culprit is to use the righ BLK and RBLK args.
* Does MIR handle both cdecl and stdcall on windows? (I'm on macos aarch64 but windows feels important).
Yes, Windows C ABI is implemented for Windows but Windows is still not a part of supported targtes mostly because of setjmp/longjmp is not implemented by c2m
. This is because of lack a good and detail documentation about setjmp/longjmp.
* Is it possible to insert debug info?
No. Although I have plans to generate debug info. It would be quite valuable for debugging code generated by c2m
. But I can not say about the timeline for this feature.
* will it be possible to have a zero cost style exception handling, (i.e. zero cost on the happy path) , e.g. like the itanium one that C++ uses?
Exception handling is out of scope of MIR project. But I guess unwinding is possible to implement outside of MIR-related code.
One last question, can you point me to anything that explains how to link with other libraries into JIT compiled code?
MIR_link
has a function import_resolver
as an arg. This function should provide an address of external whose name is the argument of the function. So you can refer external data or call functions outside JITted code from the JITted code.
Thank you - I've been hacking a C compiler (as well as my main language project) and it maybe that attempting to emit MIR might be a good place to start.
Do you have a list of other projects using MIR or is it early days yet?
Really looking for mutual support / conversation / community around using MIR as a backend rather than working on it (although you never know - but my key interest presently is seeing if being able to dynamically generate generic and templated functions works as well as I hope).
Can I confirm my understanding? - I was under the impression that Itanium style exception handling is more implemented in the abi than as IR features and presumably thus is somewhat independent of the IR.
Do you have a list of other projects using MIR or is it early days yet?
As I know it is used in lua dialect (https://github.com/dibyendumajumdar/ravi) and faust language (https://github.com/grame-cncm/faust). There are probably more of which I am not aware.
I also tried to make MIR-based JIT (https://github.com/vnmakarov/ruby) but recently basically stopped this work because Spotify YJIT success (they have a whole team working on it while I was able to spend very few my time on my MIR-based Ruby JIT).
are those features usable from C code since I don't directly generate MIR instructions?
Yes. Btw, I did use only C too for Ruby JIT I mentioned.
Hi, are there any docs on the C extensions?
I also tried to make MIR-based JIT (https://github.com/vnmakarov/ruby) but recently basically stopped this work because Spotify YJIT success (they have a whole team working on it while I was able to spend very few my time on my MIR-based Ruby JIT).
That's a pity - I think you could continue on Ruby not to compete with YJIT but more as a way to discover how to improve MIR.
Hi, are there any docs on the C extensions?
Not really. Sorry, I'll create a document when I have a time or definitely for the release. Global register vars look analogous to GCC extension:
register void *ec asm("r13");
Overflow builtins can be found in tests c-tests/new/{add,sub,mul}-overflow.c
. Builtin expect is analogous to GCC one.
That's a pity - I think you could continue on Ruby not to compete with YJIT but more as a way to discover how to improve MIR.
Actually I spent half of my work time on Ruby JIT. But for some reasons, it became impossible to do it anymore.
I think it is done. There is already a lighter versions of calls. BBV branch has new MIR insns (JCALL and JRET) and corresponding C builtins.
Usually interpreter is implemented through indirect goto like
pc+=insn_len; goto *pc
where the first VM insn word contains address of labeled code of the interpreter responsible for execution of the corresponding VM insn. With the new MIR insns it is easy to switch to JITted code (of one or more VM insns) by changing address in the 1st VM insn word. The global context can be passed by global register variables (another MIR extension on bbv branch) also used in the interpreter as local register variables (a GCC extension).Also by default all calls in MIR are implemented through calling thunk. It permits to change the generated code on-the-fly. When it is not necessary, on the bbv branch you can get and use the generated MIR function directly bu using a new function (_MIR_get_thunk_addr, the name of this function is not final and can be changed in the release).
I use this extensions to implement an experimental JIT for Ruby https://github.com/vnmakarov/ruby.
Is it possible to implement labels as values through this?
Is it possible to implement labels as values through this?
Not, really. Thunks are only for function calls.
With thunks, it is possible to generate a new version of code for the same function and this code will be automatically used by already generated code. For example, the first version of the code can be minimally optimized and, when the function code is executed frequently, you can use more optimized code. In general, it is even possible to use another compiler (e.g. LLVM) to generate another version of the function code.
Implementing labels as values requires consideral changes in MIR-generator as any indirect goto can potentially result in jump to any BB. It is a moderate size project and I have intention to implement it some day.