Shopify/yjit

Possible simplification in iseq-to-iseq argument passing

Closed this issue · 2 comments

XrXr commented

Here is a review of the argument setup details in gen_send_iseq() cause I think there is room for simplification, both in terms of generator code and generated code. It's not a good time to make these changes right now because we need to focus on the Rust prototype, but we might want to revisit this later.

Basically the current calculation of all the offsets to the arguments on the stack are largely unnecessary because the iseq already has that infomation we can directly use. Take a look at iseq_set_arguments_keywords() and illustrations in vm_make_env_each() and in slot_to_local_idx(). For instance iseq->body->param->keyword.bits_start gives the index to the hidden local that tracks unspecified keyword arguments.

Taking advantage of those information, we could make the following change:

  • make the callee context early and pop in the caller context early
  • lea REG_SP, ... early instead of into REG0 and then mov REG_SP, REG0 later. This removes one instruction in the output.
  • arguments start out in the callee as temporaries but we mutate the context to enshrine them as locals before generating the jump to the callee

This removes tricky data flow here and hard to reason about tally like here. Correctness follows from consistency with interpreter usages.

This also will make this line not do a double negative in the offset calculation. I also think the calculation is not correct if we start doing rest args.

For setting up the callee's argument as locals, I think we'll need context helpers that take indices where 0 means bottom of the stack rather than the top element of the stack. In order words, index that don't do sp - 1 - idx like here. We need this because the indices in the iseqs work that way.

If this ends up in the Rust we might do:

enum StackIndex {
    ZeroIsTop(i8),
    ZeroIsBottom(i8),
}

struct StackTmp;

const MAX_TEMP_TYPES: usize = 8;

struct Ctx {
    sp_offset: u16,
    stack_size: u16,
    /// This is different form the C code for no good reason. We track the bottom of the stack in C YJIT.
    top_temps: [StackTmp; MAX_TEMP_TYPES],
}

/// An alias for i32. Not a distinct type. Similar to typedef in C.
type RegSpOffset = i32;

impl Ctx {
    fn ctx_stack_opnd(&self, idx: StackIndex) -> RegSpOffset {
        use StackIndex::*; // so it's DRY

        // avoid converting with the as keyword because it truncates
        // without warning. Use the From trait, which is only implemented
        // when lossless conversions are defined, promoting bad truncations
        // to build errors. Trade off is the visual noise.

        match idx {
            ZeroIsTop(top_idx) => i32::from(self.sp_offset) - 1 - i32::from(top_idx),
            ZeroIsBottom(bot_idx) => {
                i32::from(self.sp_offset) - i32::from(self.stack_size) + i32::from(bot_idx)
            }
        }
    }
}

My general comment is that I'm not opposed to making these kinds of improvements, but now is not the right time to re-examine a bunch of design decisions. Doing that is going to lengthen the time we need to translate YJIT to Rust and potentially introduce subtle bugs.

the current calculation of all the offsets to the arguments on the stack are largely unnecessary

Which calculation where?

I'm missing some context to understand the implications of what you're suggesting.

enum StackIndex {
    ZeroIsTop(i8),
    ZeroIsBottom(i8),
}

Why do you want to support both kinds of stack indices?

top_temps: [StackTmp; MAX_TEMP_TYPES],

The reason I'm storing the bottom temps and not the top temps right now is to avoid having to shuffle things in the context.

XrXr commented

... now is not the right time to re-examine a bunch of design decisions. Doing that is going to lengthen the time we need to translate YJIT to Rust and potentially introduce subtle bugs.

Totally fair! I edited the issue to include this important timing info. I need to focus on translating, too.

Which calculation where?

I'm referring to the tally we do with argc, the sp[-{1,2,3}] stores etc. We calculate the address to these VALUEs owned by the callee. I've linked to examples of this kind of calculation in the post.

Why do you want to support both kinds of stack indices?

Since the information about where the locals in the iseq, such as iseq->body->param->keyword.bits_start are offsets where larger means higher addresses, whereas in current usages everything stack related is ZeroIsTop, where larger means lower addresses. To take advantage of the info on the iseq we need to support both.

The reason I'm storing the bottom temps and not the top temps right now is to avoid having to shuffle things in the context.

Speaking of bugs, you just caught one! I didn't remember that things we track are the bottom most elements 😛. I put in a comment.