Hard-to-reproduce segmentation fault with Parla cleanup

Question

Hard-to-reproduce segmentation fault with Parla cleanup

Opened this issue 4 years ago · 1 comments

I'm occasionally seeing a segmentation fault with the QR factorization app that occurs with Parla (non-VEC related). I haven't been able to reproduce it myself but it's come up before and it came up recently when one of my teammates was testing his Parla install using the app. He ran on the Maverick2 gtx node and said the bug only happened the first time he ran it (I'm not sure whether that's a coincidence or not). The command was python qr_parla.py run in the Parla.py/benchmarks/qr_factorization directory. The output is as follows:

(base) ~/Parallelism-Locality/project/Parla.py/benchmarks/qr_factorization$ python qr_parla.py
%**********************************************************************************************%

Config: rows=5000 cols=100 block_size=500 iterations=1 warmup=0 threads=16 ngpus=4 placement=gpu check_result=False csv=False
--- ITERATION 0 ---
t1
Num GPU tasks: 10
H2D: 0.0742948055267334
CPU kernels: 0
GPU kernels: 26.731510162353516
D2H: 0.005108356475830078
Total: 2.2886359691619873

t2
Total: 1.2995290756225586

t3
Num GPU tasks: 10
H2D: 0.0652766227722168
CPU kernels: 0
GPU kernels: 0.007777690887451172
D2H: 0.008157730102539062
Total: 0.015547752380371094

Full run total: 3.606827974319458

%**********************************************************************************************%

Segmentation fault (core dumped)

Note that the segmentation fault occurs after the main program has completed, presumably when Parla is cleaning up its resources.

Answer 1 · 2021-05-11T19:43:17.000Z

@insertinterestingnamehere This was for a non-VEC app. Did you mean to add that label?