Add a compiler pass to convert 32 bit PTX to 64 bit PTX
Opened this issue ยท 5 comments
Expected outcome:
Comments:
- It should be either:
- Its own compiler flow (make changes to 32 bit ptx and emit textual 64 bit textual flow) to work with . That's what we ultimately want.
- It might be easier to just add it to the existing compiler flow as the first pass. It will work only for ZLUDA.
- This pass does not have to be fully general, it should be good enough to convert constructs found in 32 bit PhysX PTX
- The biggest changes will memory accesses. Every kernel will receive additional implicit p64 bit pointer argument that points into the memory pool. We will convert all 32 bit pointer access to offset accesses into this pool. Texrefs will probably remain unchanged if all the
texinstructions use global variables - This pass will be in cahoots with the runtime (#354). E.g. runtime needs to know there's an implicit 64 bit pointer as the first argument.
An example. From:
.version 6.5
.target sm_30
.address_size 32
.visible .entry add(
.param .u32 input,
.param .u32 output
)
{
.reg .u32 in_addr;
.reg .u32 out_addr;
.reg .u32 temp;
.reg .u32 temp2;
ld.param.u32 in_addr, [input];
ld.param.u32 out_addr, [output];
ld.u32 temp, [in_addr];
add.u32 temp2, temp, 1;
st.u32 [out_addr], temp2;
ret;
}
to:
.version 6.5
.target sm_30
.address_size 64
.visible .entry add(
.param .u64 _implicit_buffer,
.param .u32 input,
.param .u32 output
)
{
.reg .u32 in_addr;
.reg .u32 out_addr;
.reg .u32 temp;
.reg .u32 temp2;
.reg .u64 _implicit_buffer_ptr;
ld.param.u64 _implicit_buffer_ptr, [_implicit_buffer];
ld.param.u32 in_addr, [input];
ld.param.u32 out_addr, [output];
{
.reg .u64 _address_64;
.reg .u64 _offset_64;
cvt.u64.u32 _offset_64, in_addr;
add .u64 _address_64, _implicit_buffer_ptr, _offset_64;
ld.u32 temp, [_address_64];
}
add.u32 temp2, temp, 1;
{
.reg .u64 _address_64;
.reg .u64 _offset_64;
cvt.u64.u32 _offset_64, out_addr;
add .u64 _address_64, _implicit_buffer_ptr, _offset_64;
st.u32 [_address_64], temp2;
}
ret;
}
I'd love to contribute to the project. However, I don't have access to an AMD GPU. Is there a way to set up a testing environment or a simulation to verify functionality without the target hardware?
So, is it possible that after the 32-bit (x86) program is converted to 64-bit (x64), I'll be able to run it correctly on my NVIDIA GPU? I'd like to confirm if this is the expected outcome.
I've looked at the FluidMark program, which is a benchmark for PhyXCore. It's a fatbin composed of SASS for sim10, sim20, and sim30. I tried to use the PhyX SDK to compile PTX but encountered complex environment issues. Could you provide a PhyX benchmark that only uses PTX?
So, is it possible that after the 32-bit (x86) program is converted to 64-bit (x64), I'll be able to run it correctly on my NVIDIA GPU? I'd like to confirm if this is the expected outcome.
It's up to the implementer. After it's converted from 32bit to 64bit it could run on either (with AMDGPU-only path being probably a bit simpler)
I've looked at the FluidMark program, which is a benchmark for PhyXCore. It's a fatbin composed of SASS for sim10, sim20, and sim30. I tried to use the PhyX SDK to compile PTX but encountered complex environment issues. Could you provide a PhyX benchmark that only uses PTX?
I'm not sure what you mean, FluidMark fatbins contain PTX. You can see logs here: #352