zonyitoo/context-rs

Possible minor performance improvements in x86-64 code

raphaelcohn opened this issue · 2 comments

I've been looking at the x86-64 code for make and jump, for Linux and Mac OS.

I think there might be a little, tiny room for some performance improvements:-

  • In src/asm/jump_x86_64_sysv_elf_gas.S, there's a LEA instruction followed by use of the RSP register. Encodings of the RSP register always require a SIB (scaled-index byte), even when the code involved doesn't use an index. And this code doesn't. It would make for smaller code size (and potentially a very minor icode cache win) to use a different register.
  • It should be possible to use the RAX register as this different register, and so combine the initial LEA and subsequent MOV of RSP into RAX, eliminating an instruction in the process.
  • In src/asm/make_fcontext_x86_64_sysv_elf_gas.S the AND with -16 may not be needed as we are always passing in a page size aligned stack (whose alignment will exceed 16).

I understand that this project re-uses Boost's upstream assembler code and so may be reluctant to explore this - and the performance improvement is likely to be tiny. Still, I thought it might be useful to record for posterity. I'm going to take a look myself at trying out the changes above if I get the time. (Writing assembler for me requires a very clear head and a lot of free time).

(As an aside, there seems to be complete duplication of the assembler in the Linux and Mac OS code - that's not surprising, as both use the same SysV ABI. Linking is probably another matter).

Yup. But the most urgent issue is to find out why #37 was broken.

Closing as has gone stale.