Code reading: LuaJIT
lukego opened this issue · 7 comments
I have started reading LuaJIT sources. I like the fact that the source code is compact and it is reasonable to print and read a whole file (or read it an iPad with iOctocat).
The parts I am reading now are the profiler, the dumper, and the trace assembler. I have a basic mental model of tracing JITs from Thomas Schilling's thesis.
I have a few interests here:
- I would like to have a stronger mental model of the data structures involved. How is the Intermediate Representation stored in memory? Is it ephemeral or persistent? How much cross-referencing information is available between the representations: could you generate an interleaved listing of the IR and the machine code for example? (source too?) I am accustomed to knowing these kind of details from other languages like Forth, Lisp, and Smalltalk, but I haven't dug down to that level of LuaJIT yet.
- Does the assembler really assemble backwards from the last IR instruction? (If so then does each IR instruction assembler emit the machine code backwards too?)
- How can I always have a visceral feeling for how my code is executing on the CPU? Currently it takes me quite a bit of manual legwork to analyze program behavior: dump traces to a file, profile to see which traces are relevant, stare at the traces to see which code they are related too, and so on. I would love to have this much more streamlined e.g. for the profiler to automatically show me an interleaved IR/machinecode dump of all traces using >= 5% CPU with annotations on the hotspots. This is the kind of thing that is quite transparent in
perf top
when programming in C. - I would like to have a better feeling for what makes LuaJIT happy, what makes it sad, and what makes it unpredictable. I want to really see in the generated code what are the consequences of things like unpredictable branches within loops. I am sure that I could adapt my programming style to be better suited to the compiler but this has to be driven by a better understanding of the compiler rather than following "do this, don't do that" lists of program optimization rules.
Generally I am very enthusiastic about LuaJIT. I do see it as a technology in the tradition of Lisp, Forth, and Smalltalk: one that is intellectually rewarding to study and use. I look forward to spending a lot more time with it.
Please leave a comment if you know anything about LuaJIT internals :).
Could be that what I really want is better perf
integration. For example, for perf top
to be able to zoom in and show me machine code annotated with IR code. That would be something :-).
I'm sure I know less than you, but I read a little of the source after Mike's talk and could stand to read more.
@darius maybe we can thrash our way through it together a bit.
I'm looking now in lj_asm_trace
and it seems like the machine code is created by iterating backwards from the last IR instruction. Each instruction is assembled in order and emits some machine code (also in backwards order?).
Thinking aloud...
Then it would seem like you could output a really neat interleaved IR/mcode dump because the order of instructions would be identical in both and each mcode instruction would correspond to one definite IR instruction. Sound reasonable?
Then I wonder if such an interleaved trace could be exported to perf top
so that it could be annotated in realtime with profiler information (as you get with C code). I wonder if that would require some hacking on perftools too? I haven't yet dug into the details of the data file that LuaJIT optionally exports for use by perf -- I know the current usage is quite limited (only matching instructions to trace numbers but not showing the IR/mcode).
Could also be that perf
is the wrong tool and it would be better to add this to LuaJIT's own profiler.
Yes, looks like it's emitting back-to-front. Makes me feel a little vindicated since my own x86 machine-code emitter worked that way and it got me a few funny looks.
The dump sounds reasonable too, though you'd want to make it a separate function to keep any overhead out of the normal asm function -- that's important, I assume?
Will email you tonight to catch up.
The dropbox link in the reddit thread seems dead (For the thesis you mention). Do you have any other uploads of it?
Not on my laptop anymore, I'm afraid.