Structured code for the stack machine

Question

Structured code for the stack machine

Opened this issue 9 years ago · 16 comments

Document options and issues addressing the use case of a structured language uniquely representing code in constrained stack machine. The wasm has been pivoting from an AST to a stack machine, but the goal of having a structure familiar text format has not been conceded. This is an attempt to reconcile the two, if that is possible or to understand any show stoppers.

Here is one proposal for a single pass stack machine to text decoder:

Within blocks, expressions are built up where possible until a barrier occurs and then the pending parts are emitted as lexical constants and connected via references to these constants.
An operator returning no values or more than one value is a barrier.
Values of these lexical constants implicitly popped by an operator are expressed as reference to the lexical constant.
A pick operator is always an edge of an expression, and any node referenced by a pick operator is emitted as a lexical constant. If the node referenced is part of the pending expression then it creates a barrier at the referenced node and this node and those before are emitted assigning their results to lexical constants.
Multiple value operators use a destructuring lexical constant text format, and tail values immediately discarded via drop operators are not assigned a constant name. If there is only one value and it is immediate discarded then this is presented as a statement in the text format.
drop operators not immediately following the producing operator are presented as an explicit drop() operation is in the text format. There remains the challenge of text format code the explicitly consumed a value immediately after it is generated on the stack, conflicting with the above rule, and this could either be a syntax error (perhaps quietly canonicalized to be valid), or could generate an extra redundant pick operation to separate them.

Some examples:

i32.const 1
nop
i32.const 2
i32.add
=>
const i32 $c1 = 1;
nop();
$c1 + 2;

i32.const 1
nop
i32.const 2
pick 1
pick 1
i32.add
=>
const i32 c1 = 1;
nop();
const i32 c2 = 2;
c1 + c2;

Note that this design leaves non-void values on the stack at the end of the block that would need to be implicitly discarded. This might be good for coding efficiency, but if optimizing for simple baseline compilers then a pick_and_void operator could be used on the last use of a stack element along a control flow path, and this would leave all stack elements void at the end of the block except for the block result values and this could be validated to enforce this pattern.

Discussion and other possible strategies:

A function call implicitly pops one value for each argument in reverse order, and pushes results onto the stack with one stack element for each result value. These standard function calls can not accept as arguments expressions that return no values.
Re-ordering and duplication of a contiguous block of stack values at the top of the stack up to a block boundary could be implement by a function call, so can be presented in an expression based language. However there are common patterns for more specific operations. The challenge is to keep the text format and the code uniquely representing each other and to avoid making the text format brittle and frustrating. One general principle is to have canonical codes for patterns and to validate these.
The challenge of referencing the stack elements can be addressed by using lexically scoped constants in the text format. Uses of these would be expressed as simply a label. For example,

  call give_me_i32_i32;
  i32.add;
  =>
  const (i32 $c0, i32 $c1) = give_me_i32_i32();
  $c0 + $c1;

Operators that return no values can still often be presented in an expression using special text operators. For example:

  const (i32 $c0, i32 $c1) = concatenate_values(1, nop(), 2);

Duplication of values up the stack can use a pick operator which is a special case of the general re-ordering and duplication above. Validation to canonical patterns would keep the encoding choices out of the structure format. For example dup might need to be used rather than pick 0. These uses can be expressed in a familiar expression:

  const i32 $c0 = $a;

A re-ordering of the elements on the stack might be expressed with a scope and values text expression. For example:

const (i32 $c0, i32 $c1, $c2,  ...) = { const i32 $a = ...; ... ; values($a, $b, $c, ...); };

If there are specialized stack machine operators for high frequency re-ordering cases then they could be made explicit in the expression language, or there could be canonical uses that are the only valid option so that they could all be expressed by a general operator. For example:

  const (i32 $c0, i32 $c1) = { const i32 $a = ...; const i32 $b = ...; swap($a, $b); };

Arbitrary ordering of stack constant references, and multiple uses. The text format would be too fragile, and not support multiple uses if the labelled stack constants had to be consumed in stack order, so a unique mapping needs to be defined. For example:

  call give_me_i32_i32;  [1, 2]
  pick 1                           [1, 2, 1]
  i32.sub;
  =>
  const (i32 $c0, i32 $c1) = give_me_i32_i32();
  $c1 - $c0;

In a familiar language the producer expectes to be able to reference any lexical constant, not to have some fragile constraints on the order of usage. The pick operator addresses this.
Function calls with a single result which has a single use can be consumed in an expression when done in stack order. All other results are presented as labelled lexical constants.
If a single value function result with a single use is coded in text as an assignment to a lexical constant then references to this are coded with a pick operation in the stack machine code.
All stack elements with multiple uses are referenced by their lexical constant name in the text format. They are never collapsed into expressions.
The last use of a stack element at the top-of-stack can simply pop it, without a pick operation, even if it had other uses. The text decoder will have to keep track of barriers to forming expressions to split these into either being folded into an expression or a lexical constant reference.

i32.const 1
nop
i32.neg
=>
const i32 $c0 = 1;
nop();
i32.neg($c0);

i32.const 1
pick 0
i32.add
=>
const i32 $c0 = 1;
$c0 + $c1;

Answer 1 · 2016-08-10T07:51:15.000Z

@JSStats I think that with pick and pick_and_void (take) ISA don't need any operations with locals.

I agree that extra sequence of pick_and_void operations would be needed to equalize stack balance on the block boundary. Here is example:

call give_me_a_b_c_d    [ a b c d ]
block                   [ a b c d ]
  take 3                [ _ b c d a ]
  take 3                [ _ _ c d a b ]
  i32.const 42          [ _ _ c d a b 42 ]
  i32.add               [ _ _ c d a b+42 ]
  take 3                [ _ _ _ d a b+42 c ]
  take 3                [ _ _ _ _ a b+42 c d ]
end                     [ a b` c d ]

=>
const (i32 $a, i32 $b, i32 $c, i32 $d) = give_me_a_b_c_d();
{ // loop
  $b = $b + 42;
}

The actual order of stack elements not reflected in the structural code and potentially ambiguous. Unless structural code is SSA, then we might need an equivalent of Φ function.

Answer 2 · 2016-08-10T08:09:25.000Z

@JSStats swap proven to be very handy stack manipulation concept. Here is the same example with and without swap:

call give_me_a_b_c_d    [ a b c d ]
block                   [ a b c d ]
  pick 1                [ a b c d a ]
  swap                  [ a b c a d ]
  i32.sub               [ a b c a-d ]
end                     [ a b c d` ]

call give_me_a_b_c_d    [ a b c d ]
block                   [ a b c d ]
  take 3                [ _ b c d a ]
  take 3                [ _ _ c d a b ]
  take 3                [ _ _ _ d a b c ]
  pick 2                [ _ _ _ d a b c a ]
  take 4                [ _ _ _ _ a b c a d ]
  i32.sub               [ _ _ _ _ a b c a-d ]
end                     [ a b c d` ]

=>
const (i32 $a, i32 $b, i32 $c, i32 $d) = give_me_a_b_c_d();
{
  $b = $a - $d;
}

Answer 3 · 2016-08-10T09:30:41.000Z

@drom Not sure swap helps here as they are picked in the order needed anyway. Not sure why the blocks are needed here. Some suggestions:

call give_me_a_b_c_d    [ a b c d ]
pick 3                  [ a b c d a ]
pick 1                  [ a b c d a d ]
i32.sub                 [ a b c a-d ]
=>
const (i32 $a, i32 $b, i32 $c, i32 $d) = give_me_a_b_c_d();
$a - $d;

call give_me_a_b_c_d    [ a b c d ]
take 1                  [ a b _ d c]
drop                    [ a b _ d ]
take 2                  [ a _ _ d b]
drop                    [ a _ _ d ]
take 3                  [ _ _ _ d a ]
take 1                  [ _ _ _ _ a d ]
i32.sub                 [ _ _ _ _ a-d ]
=>
const (i32 $a, _, _, i32 $d) = give_me_a_b_c_d();
$a - $d;

Answer 4 · 2016-08-10T10:02:34.000Z

Some examples using swap:

call give_me_a_b       [ a b ]
swap                   [ b a ]
i32.sub                [ b-a ]
=>
const (i32 $a, i32 $b) = give_me_a_b();
const (i32 $c, i32 $d) = swap($a, $b);
$c - $d

call give_me_a         [ a ]
nop                    [ a ]
call give_me_b         [ a b ]
swap                   [ b a ]
i32.sub                [ b-a ]
=>
const i32 $a = give_me_a();
nop();
const (i32 $c, i32 $d) = swap($a, give_me_b());
$c - $d

Answer 5 · 2016-08-10T16:14:30.000Z

@JSStats

@drom Not sure swap helps here as they are picked in the order needed anyway. Not sure why the blocks are needed here.

I introduced block to illustrate the case when it is needed (like loop) where stack balance need to be preserved.

Answer 6 · 2016-08-10T23:56:47.000Z

@drom The loop operator has an implicit block. You do illustrate a key point though:

A key requirement is that excess values left on the stack in a block are implicitly discarded, and this is a requirement that the current wasm design has been moving away from and doing so is clearly not a tenable plan.

For a function SSA style loop, here is a suggestion. The loop operator entry consumes n values from the stack (like a function call), and this could be implemented by adjusting the block stack positions. The repeat path for a loop requires n values on the loop block stack and these are retained and all other values discarded, like a tail call to the loop head. Other exits from the loop leave the number of values expected by the target the stack discarding all other values, and the loop fall-through would return m values to the parent which might be different to the number passed in the repeat path.

Using this same approach for the function entry, the function arguments would be within the functions implicit block, and this might be implemented by block stack adjustments, so they could be dropped etc. A tail recursive call would expect the argument values to be on the stack and would discard all other block values.

That would seem to leave no need for the local variables in wasm. Perhaps they could be retained to help some producers.

Answer 7 · 2016-08-11T01:15:20.000Z

@JSStats do you envision structural code text to be SSA like?

Answer 8 · 2016-08-11T02:10:27.000Z

@drom I think a functional SSA text format would be a good start. Might want to add some lexical scoping of definitions to help communicate the live range of definitions to the reader too. If the wasm encoding does not support an expression based language well then this is where I think the text format will end up, that this is what developer would want to debug.

Answer 9 · 2016-08-11T05:02:06.000Z

@JSStats any idea about syntax for the function declaration with multi-returns?

Answer 10 · 2016-08-11T05:43:00.000Z

@drom There is already provision for declaring multiple return values in the binary encoding, and the s-exp format, so it's just a matter of some bike-shedding to chose a text format and there are other issues discussing the structure of the text format.

Answer 11 · 2016-08-11T17:53:06.000Z

@JSStats IMHO on the topic:

The answer will mostly depend on the following aspects:

WASM keep locals on the value stack or in separate space (as it was before)?
WASM has pick, take operations?
WASM has set_stack_element_with_magic deep stack modifications.

if (?1 == locals off stack) {
    Yes, one can create structural language, and the S-expr is an example.
} else
if (?2 == we have pick/take) {
    Yes, one can organize arbitrary DFG/CFG;
    if (?3 == we have deep stack set magic) {
        Yes, one can create structural language;
        But! it is not A stack machine anymore (IMHO);
    } else {
        ???, there might be no unique correspondence between structured language
        and the stack ISA sequence. Multiple possible ways of stack
        allocation/deallocation not expressible in structured language?
    }
}

Answer 12 · 2016-08-11T19:48:11.000Z

One more successful example of structured code translation:

;; Iterative factorial named
(func $fac-iter (param $n i64) (result i64)
  (local $i i64)
  (local $res i64)
  (set_local $i (get_local $n))
  (set_local $res (i64.const 1))
  (loop $done $loop
    (if
      (i64.eq (get_local $i) (i64.const 0))
      (br $done)
      (block
        (set_local $res (i64.mul (get_local $i) (get_local $res)))
        (set_local $i (i64.sub (get_local $i) (i64.const 1)))
      )
    )
    (br $loop)
  )
  (get_local $res)
)

the same code in structuted language.

function fac-iter ($n:i64) : (i64) {
  i64 $i = $n;
  i64 $res = 1;
  loop {
    if ($i == 0) {
      break;
    } else {
      $res = $i * $res;
      $i = $i - 1;
    }
  }
  $res;
}

Direct translation:

function fac-iter     // i64:$n
  take 0              // _      i64:$i  <-- dumb but consistent with the source
  i64.const 1         // _      i64:$i i64:$res
  begin               // i64:$i i64:$res
    pick 1            // i64:$i i64:$res i64:$i
    i64.const 0       // i64:$i i64:$res i64:$i i64
    i64.eq            // i64:$i i64:$res i64
    if                // i64:$i i64:$res
      break
    else              // i64:$i i64:$res
      pick 1          // i64:$i i64:$res i64:$i
      take 1          // i64:$i _        i64:$i   i64:$res
      i64.mul         // i64:$i _        i64:$res
      take 2          // _      _        i64:$res i64:$i
      i64.const 1     // _      _        i64:$res i64:$i i64
      i64.sub         // _      _        i64:$res i64:$i
      take 1          // _      _        _        i64:$i i64:$res <-- automatic cleanup
    endif             // i64:$i i64:$res
  again
  take 1              // _      i64:$res i64:$i
  drop                // _      i64:$res
}                     // i64:$res

Direct translation with swap:

function fac-iter     // i64:$n
  take 0              // _      i64:$i
  i64.const 1         // _      i64:$i i64:$res
  begin               // i64:$i i64:$res
    pick 1            // i64:$i i64:$res i64:$i
    i64.const 0       // i64:$i i64:$res i64:$i i64
    i64.eq            // i64:$i i64:$res i64
    if                // i64:$i i64:$res
      break
    else              // i64:$i   i64:$res
      pick 1          // i64:$i   i64:$res i64:$i
      swap            // i64:$i   i64:$i   i64:$res
      i64.mul         // i64:$i   i64:$res
      swap            // i64:$res i64:$i
      i64.const 1     // i64:$res i64:$i i64
      i64.sub         // i64:$res i64:$i
    endif             // i64:$i   i64:$res
  again
  swap                // i64:$res i64:$i
  drop                // i64:$res
}                     // i64:$res

Answer 13 · 2016-08-12T03:03:01.000Z

@drom Cool. Here's another suggestion for loop using the stack. Might help to keep the operator names and structure as close as possible to the current wasm code. I like your stack annotation syntax, but lets add the block boundaries to understand the clean up, have used |. The if branches are both blocks. Add a text format suggestion too. One point noted is that code in a child block is mutating the parent block elements, but just to reset them to void or empty, and I wonder if this could be a problem, not sure.

function fac-iter     // i64:$n
  i64.const 1         // i64:$n i64:$res
  loop nargs=2 nres=1 // | i64:$i i64:$res
    pick 1            // | i64:$i i64:$res i64:$i
    i64.const 0       // | i64:$i i64:$res i64:$i i64
    i64.eq            // | i64:$i i64:$res i64
    if nres=0         // | i64:$i i64:$res |
      br 2            // $done
    else              // | i64:$i i64:$res |
      pick 1          // | i64:$i i64:$res | i64:$i
      take 1          // | i64:$i _        | i64:$i   i64:$res
      i64.mul         // | i64:$i _        | i64:$res
      take 2          // | _      _        | i64:$res i64:$i
      i64.const 1     // | _      _        | i64:$res i64:$i i64
      i64.sub         // | _      _        | i64:$res i64:$i
      take 1          // | _      _        | _        i64:$i i64:$res <-- repeat arguments.
      br 1            // $repeat
    end               // unreachable
  end                 // i64:$res
}                     // i64:$res

=>
function fac-iter ($n:i64) : (i64) {
  loop $repeat (i32:$i = $n, i64:$res = 1) : $done (i64) {
    if ($i == 0)
      br $done $res;
    else {
      const i32 $res = $i * $res;
      const i32 $i = $i - 1;
      br $repeat ($i, $res);
    };
}

Answer 14 · 2016-08-12T06:02:19.000Z

@JSStats I don't understand how come loop block can have nargs=2 nres=1? It become impossible to repeat such loop?

Also why you have moved br 1 inside else block? and removed last take 1 drop ? that was for for stack cleanup after the block.

Why if need this nres=0?

Answer 15 · 2016-08-12T07:26:41.000Z

@drom It is just a suggestion, the loop operator needs to consume its inputs, consumes two stack elements in this case, the start of this loop block has two elements that it consumed. A branch/break to the repeat label consumes two values and unwinds the rest. A branch/break to the $done label keeps the one result value and unwinds the rest.

If the branch at the end of the else block were after the if then the code would have to deal with returning two values from the if operator, it's cleaner to just unwind from the else branch. The if blocks return no values here, so the nres = 0, and the code after the if is unreachable here anyway.

Answer 16 · 2016-08-17T15:12:40.000Z

I've written up a bit about the stack machine here:

https://docs.google.com/document/d/1CieRxPy3Fp62LQdtWfhymikb_veZI7S9MnuCZw7biII/edit?usp=sharing