Bug with rules passing arguments to other rules
Mitch-Siegel opened this issue · 1 comments
In trying to write a sort of macro to load a 32-bit constant on a machine where instruction size is 32 bits, I came across this issue. I had intended to make a macro that takes a destination register from a defined list of possible registers and a 32-bit immediate (mov %{rd: reg}, ${imm: i32}
), splitting up the wider load into a 16-bit load, left shift, and 16-bit immediate add.
mov %{rd: reg}, ${imm: i32} => {
upper = imm[32:16]
lower = imm[15:0]
asm {movh %{rd}, ${upper}
shli %{rd}, $16
addi %{rd}, %{rd}, ${lower}}
}
However, this fails stating that is unable to match the movh
instruction (which has the same format as the wider mov but 16-bit immediate)
Attempting to do the same with a function definition doesn't work, giving the same error:
#fn bigmov(rd, imm) =>
{
lower = imm[15:0]
upper = imm[31:16]
asm {movh %{rd}, {upper}
shli %{rd}, $16
addi %{rd}, %{rd}, {lower}}
}
@hlorenzi reproduced the issue with a minimal example, believing that there is probably a bug in the use of %
or $
next to arguments:
#subruledef reg
{
r0 => 0x0
}
#ruledef
{
movh {rd: reg}, {imm: i16} =>
0xbf @ rd @ 0x0 @ imm
mov %{rd: reg}, {imm: i32} => asm {
movh {rd}, 16
}
mov % {rd: reg}, {imm: i32} => asm {
movh {rd}, 16
}
}
;mov %r0, 12345678
mov % r0, 12345678
While I've managed to resolve the issue with %r0
and $0x1234
to stop them being interpreted as a number literals, the issue in general is going to be more difficult to solve.
If you use it like mov %r0, $1234
, then the tokenizer would see $1234
as a single token for a hex number literal, because it's a valid hex number. But then ${imm: i32}
would fail to match, since it's expecting at least two tokens, one for $
and the rest for the expression.
For this to work as intended in every case, the instruction matcher would have to re-merge the stream of tokens and break them apart at a different spot, reinterpreting the stream of characters as it works on each instruction with more context. I think I'll leave this as an exercise for the future.
Is it possible that you change $
(and even %
perhaps) to different tokens in your instruction set? It would avoid future ambiguity issues with number literals.