Interpreter Adaptation for New Frame Design

Question

Interpreter Adaptation for New Frame Design

Closed this issue 3 years ago · 1 comments

The new design for frames in #16 has a number of consequences for the interpreters.

The main issue for the interpreter is that the we only know where an argument or local variable is going to be stored in the new frame once a method has been parsed completely.
If there are any blocks further down the method, we may end up storing an argument/local in the Inner, which means we need to access it differently. Though, we can't really know that before reaching that part of the method.

This means, we need to either patch up the access indexes after completing the parse, or, look them up on first execution.

AST Interpreter Design

The AST interpreter already used a set of nodes that would specialize themselves at run time on first execution.
Thus, the easiest change was to change the nodes to consider the new frame design.
In the process, this simplified the set of nodes needed, since accessing receiver, arguments, and local variables works now all the same. For the receiver, we also know already at parse time the fixed offset in the frame, which means we specialize it eagerly.

To Align PySOM and TruffleSOM, or not? Probably not completely.

The situation for the bytecode interpreter is a little more difficult.
There's the overarching concern of whether TruffleSOM and PySOM should be as closely aligned as possible. (Except for the frame design, the AST interpreters are rather similar already.)
My initial experiments in aligning the calling convention of PySOM to the one of TruffleSOM indicated a huge performance impact. So, moving to a Truffle-style calling convention of a single object array does not seem good from the performance perspective.

Though, on the other hand, any difference between PySOM and TruffleSOM likely means more complexity, more work implementing optimizations, because I need to adapt them to the two slightly different interpreters. For the moment, I think, I should push the RPython interpreters as hard as possible for speed, since I can't do that as easily with the Truffle ones, given the platform constraints.
I might regret it later, and change things around again, but for the moment, the added complexity will hopefully pay off in terms of performance.

Bytecode Interpreter Design

So, with the extra freedom of not aligning the implementations, the big question now is how to realize the bytecodes for the new frame design.

The two options are bytecode quickening, and a second pass over the bytecode.

Option 1: Determining Access Details By Quickening

Since we now also use Argument and Local objects in the bytecode parser, we could simply store those in the method, and look up the access index and whether an argument/variable is in the Frame or Inner on first execution.
This would be very similar to the node rewriting done in the interpreter, and avoid a second pass over the bytecode.
Though, on the other hand, this means we need POP/PUSH_VAR bytecodes accessing the Variable objects, and then Q_PUSH/POP_FRAME/INNER bytecodes as quickened versions.

the benefit would be to avoid a costly second pass over the bytecode
only code that's executed needs to be updated
we could simply keep all Argument/Local objects in the method
drawback is an extra run time overhead for quickening
it also bloats the bytecode set

Option 2: Second Pass

Turns out, we already do a second pass over the bytecode to determine the maximum stack hight needed for a method.
So, we could use the same pass to simply patch up the bytecodes for variable accesses, too.
At this point in the process, we have finished parsing the method and indeed know the indexes.

This is likely a small overhead added to method parsing, of all methods, but means we don't really need to keep the Variable objects around in the method (options would have been to have arg/local arrays, or to put them into the literal array). Both would mean more potential memory use.

Though, we still need to extend the bytecode set.
But, we only need the Q_PUSH/POP_FRAME/INNER bytecodes as active bytecodes.
We could introduce POP/PUSH_VAR as bytecodes outside the valid range, only used during parsing/compilation.
This would possibly avoid a performance impact in the interpreter.

Generally interesting with the Q_PUSH/POP_FRAME/INNER is what the frequencies for specific occurrences is going to be. In TruffleSOM, things like push_self, push_arg_1 etc are added, and work well.
Here, I'd need to look, and profile what ends up being used frequently. That's an example of the problem of diverging interpreter designs.

Think, I'll go with Option 2, fixing up accesses it in the max-stack-hight pass.

Answer 1 · 2021-07-05T10:53:44.000Z

Trying to get option 2 to work, it unfortunately seems this isn't a good option, because of block methods.

When a method has multiple blocks, side by side, they can have different requirements on the outer method.
So, they may access different sets of variables, or none, or require the catching of a non-local return.
Let's assume the first block doesn't access any variables, and doesn't return non-locally.

In that case, even if the second block does access variables or needs a non-local return, we don't have issues with determining any indexes in the first block, because we don't access any.

Though, let's assume the first block accesses some variables, and the second as well.
Now we need to coordinate the index assignment between the two blocks. And if the first block only access the second defined variable, but the second the first, then we need to support the arbitrary order of index mappings.
This is possible, but would be a mess to debug at run time, I'd think.

Another option would be to do the stack-hight pass only after all blocks are parsed, and the other method is assembled.

And yet another option would be to do the index assignment as soon as an access is parsed. At the moment, we go over the variables after the method is parsed to identify their indexes. Though, instead of just flipping a flag, I could on demand assign their place in the Inner.
Would this ever move a variable from Frame to Inner after its index has already been encoded in the bytecode?
Only nested blocks would cause this, and they have already been completed when assigning Frame indexes. So, no, shouldn't be an issue. But it means we need the pass after all is done, because while we can allocate an index on demand, we can't rely on a variable residing in Frame, because there might be a later statement with a block accessing it.
However, this seems like a bad idea, because that would mean we have arguments possibly given indexes that are not matching argument order, which would make it a mess to create the frame/inner initially correctly populated. That's quite complex info to keep around possibly costly to use at run time, making function calls more expensive.