cornell-zhang/heterocl

Shmids expired before usage

Closed this issue · 7 comments

I tried to run the GEMM HBM example on our servers. The compilation works fine, but when executing the binary, the host program crashed with a SegFault.

[INFO] Running commands:
cd project; make host

basename: missing operand
Try 'basename --help' for more information.
host.cpp: In function ‘int main(int, char**)’:
host.cpp:104:11: warning: unused variable ‘_top’ [-Wunused-variable]
   int32_t _top;
           ^~~~
[INFO] Commands outputs:
g++ -I./ -I/opt/xilinx/xrt/include -I/work/shared/common/Xilinx/Vivado/2019.2/in
clude -Wall -O0 -g -std=c++11 -fmessage-length=0 .//xcl2.cpp host.cpp  -o 'host'
  -L/opt/xilinx/xrt/lib -lOpenCL -lpthread  -lrt -lstdc++


[11:20:20] Hash macthed. Found pre-compiled bitstream
[INFO] Running commands:
cd project; ./host kernel.xclbin

/bin/sh: line 1: 160930 Segmentation fault      ./host kernel.xclbin

The Segfault occurred when the program was trying to access data from the shared memory. We may need to consider a better way to transfer data between the invoking program and host program.

Can you describe the current solution first?

I currently have no idea how to solve it... I was able to run it yesterday with exactly the same code without any changes.

We may need to consider a better way to transfer data between the invoking program and host program

Can you describe the current way first?

The current way:

HeteroCL generates host and device code. The input data (passed from python side) is written into the shared memory. Then the host program copies data from shared memory, runs the main logic, and then writes the result back to shared memory (which will be accessible from python side).

Let's try to be more specific about the methods and syscalls we are using for pass the data through shared memory.

So to be clear, we should not use the term "host" in a confusing way. In our runtime system, we have a parent process that executes the HCL program, and a child process that executes the generated codes (including the host code and the device code). And this is our current runtime flow.

  1. The user prepares data with Numpy
data = numpy.random.randint(...)
  1. The data is used by HCL runtime with our API
hcl_data = hcl.asarray(data)
f(hcl_data)
  1. The HCL runtime creates a shared memory between the parent and child processes.
int shmid = shmget(key, data_size, 0666|IPC_CREAT);
void* mem = shmat(shmid, nullptr, 0);
  1. The HCL runtime copies the data to the shared memory
memcpy(mem, hcl_data, data_size);
  1. The HCL runtime executes the child program, which reads/writes the data from/to the shared memory
system("child_program");
  1. The HCL runtime copies the updated data and free the shared memory
memcpy(hcl_data, mem, data_size);
shmdt(mem);
shmctl(shmid, IPC_RMID, nullptr);
  1. Users can retrieve the data back in Numpy format
new_data = hcl.asnumpy(hcl_data)

I also met this problem. I think we can generate a header file to store the input data, which is simpler and more reusable.