hlorenzi/customasm

Bug with rules passing arguments to other rules

Mitch-Siegel opened this issue · 1 comments

In trying to write a sort of macro to load a 32-bit constant on a machine where instruction size is 32 bits, I came across this issue. I had intended to make a macro that takes a destination register from a defined list of possible registers and a 32-bit immediate (mov %{rd: reg}, ${imm: i32}), splitting up the wider load into a 16-bit load, left shift, and 16-bit immediate add.

mov %{rd: reg}, ${imm: i32}                                 => { 
        upper = imm[32:16]
        lower = imm[15:0]
        asm {movh %{rd}, ${upper}
        shli %{rd}, $16
        addi %{rd}, %{rd}, ${lower}}
    }

However, this fails stating that is unable to match the movh instruction (which has the same format as the wider mov but 16-bit immediate)

Attempting to do the same with a function definition doesn't work, giving the same error:

#fn bigmov(rd, imm) =>
{
    lower = imm[15:0]
    upper = imm[31:16]
    asm {movh %{rd}, {upper}
        shli %{rd}, $16
        addi %{rd}, %{rd}, {lower}}
}

@hlorenzi reproduced the issue with a minimal example, believing that there is probably a bug in the use of % or $ next to arguments:

#subruledef reg
{
    r0 => 0x0
}

#ruledef
{
    movh {rd: reg}, {imm: i16} =>
        0xbf @ rd @ 0x0 @ imm

    mov %{rd: reg}, {imm: i32} => asm {
        movh {rd}, 16
    }

    mov % {rd: reg}, {imm: i32} => asm {
        movh {rd}, 16
    }
}

;mov %r0, 12345678
mov % r0, 12345678

While I've managed to resolve the issue with %r0 and $0x1234 to stop them being interpreted as a number literals, the issue in general is going to be more difficult to solve.

If you use it like mov %r0, $1234, then the tokenizer would see $1234 as a single token for a hex number literal, because it's a valid hex number. But then ${imm: i32} would fail to match, since it's expecting at least two tokens, one for $ and the rest for the expression.

For this to work as intended in every case, the instruction matcher would have to re-merge the stream of tokens and break them apart at a different spot, reinterpreting the stream of characters as it works on each instruction with more context. I think I'll leave this as an exercise for the future.

Is it possible that you change $ (and even % perhaps) to different tokens in your instruction set? It would avoid future ambiguity issues with number literals.