mmcloughlin/avo

Invalid vector register use following JMP

vsivsi opened this issue · 0 comments

This minimal Avo code:

offsets := GLOBL("offsets", RODATA|NOPTR)
for i := 0; i < 8; i++ {
	DATA(i*8, U64(i*24))
}

TEXT("readSliceCaps", NOSPLIT, "func(in *[8][]uint64, out *[8]int64)")

sliceOffsets, sliceCaps := ZMM(), ZMM()
VMOVDQU64(offsets, sliceOffsets)
inPtr := Load(Param("in"), GP64())
outPtr := Load(Param("out"), GP64())
mask := K()

JMP(LabelRef("done")) // Unconditional JMP is key

KXNORB(mask, mask, mask)
// This VPGATHERDQ will use the same vector reg for gather index and destination, which is invalid 	
VPGATHERQQ(Mem{Base: inPtr, Index: sliceOffsets, Scale: 1, Disp: 16}, mask, sliceCaps)
VMOVDQU64(sliceCaps, Mem{Base: outPtr})

Label("done")
RET()

Generates this invalid assembly output:

DATA offsets<>+0(SB)/8, $0x0000000000000000
DATA offsets<>+8(SB)/8, $0x0000000000000018
DATA offsets<>+16(SB)/8, $0x0000000000000030
DATA offsets<>+24(SB)/8, $0x0000000000000048
DATA offsets<>+32(SB)/8, $0x0000000000000060
DATA offsets<>+40(SB)/8, $0x0000000000000078
DATA offsets<>+48(SB)/8, $0x0000000000000090
DATA offsets<>+56(SB)/8, $0x00000000000000a8
GLOBL offsets<>(SB), RODATA|NOPTR, $64

// func readSliceCaps(in *[8][]uint64, out *[8]int64)
// Requires: AVX512F
TEXT ·readSliceCaps(SB), NOSPLIT, $0-16
	VMOVDQU64 offsets<>+0(SB), Z0
	MOVQ      in+0(FP), AX
	MOVQ      out+8(FP), AX
	JMP       done                     // This unconditional jump...
	KXNORB     K1, K1, K1
	VPGATHERQQ 16(AX)(Z0*1), K1, Z0    // causes this invalid use of Z0 register 
	VMOVDQU64  Z0, (AX)

done:
	RET

Which doesn't assemble:

asm: index and destination registers should be distinct: 00022 (test_amd64.s:20)	
         VPGATHERQQ	16(AX)(Z0*1), K1, Z0
asm: assembly failed

Granted this scenario is a bit contrived, but I ran into it in a real workflow whilst debugging
and adding a temporary jump over some code that was extraneous to what I was trying to
figure out. Once I saw what was happening I was able to proceed by commenting out a
bunch of code rather than changing a conditional jump to be unconditional.

Reporting because it is quite possible there are less contrived situations where this may
also pop up.