VOP1, V_MOV_B32
Closed this issue · 2 comments
V_MOV_B32 v1, 11
V_MOV_B32 v0, v1
The above instructions doesn't work as expected. Output contains previous value of v0.
However the following works,
V_MOV_B32 v2, 11
V_MOV_B32 v0, v2
Output is 11 for all threads.
Hello!
Yes, I noticed this too when I was testing things with different VGPRs.
I'm really hoping this is not an error on the assembler's part. My guess is that v2 is used in a special way here. I'll try to find something about this in the docs.
The output for the two instructions you show is:
V_MOV_B32 v1, 11 ; 8b02027e
V_MOV_B32 v0, v1 ; 0103007e
V_MOV_B32 v2, 11 ; 8b02047e
V_MOV_B32 v0, v2 ; 0203007e
Oh, I found the reason.
v1 is used as get_global_id(). v1 = 0, 4, 8, ...., 63 x4 corresponding to different work items.
If you modify it everything goes wrong. I guess you should mention that.