Possible minor performance improvements in x86-64 code
raphaelcohn opened this issue · 2 comments
I've been looking at the x86-64 code for make and jump, for Linux and Mac OS.
I think there might be a little, tiny room for some performance improvements:-
- In
src/asm/jump_x86_64_sysv_elf_gas.S
, there's aLEA
instruction followed by use of theRSP
register. Encodings of theRSP
register always require a SIB (scaled-index byte), even when the code involved doesn't use an index. And this code doesn't. It would make for smaller code size (and potentially a very minor icode cache win) to use a different register. - It should be possible to use the
RAX
register as this different register, and so combine the initialLEA
and subsequentMOV
ofRSP
intoRAX
, eliminating an instruction in the process. - In
src/asm/make_fcontext_x86_64_sysv_elf_gas.S
theAND
with-16
may not be needed as we are always passing in a page size aligned stack (whose alignment will exceed 16).
I understand that this project re-uses Boost's upstream assembler code and so may be reluctant to explore this - and the performance improvement is likely to be tiny. Still, I thought it might be useful to record for posterity. I'm going to take a look myself at trying out the changes above if I get the time. (Writing assembler for me requires a very clear head and a lot of free time).
(As an aside, there seems to be complete duplication of the assembler in the Linux and Mac OS code - that's not surprising, as both use the same SysV ABI. Linking is probably another matter).
Closing as has gone stale.