Segmentation fault (core dumped)
Closed this issue · 5 comments
Hi,
I'm in the process of switching from CC to QC. Right now, I'm trying to match past CC self-calibration to check that I'm getting similar image fidelity/improvement.
I'm getting continuous segmentation faults that kill my QC runs. Oddly, these seem to be stochastic; i.e., sometimes the command will execute fully, but most of the time, it will kill the script. QC should have no problems running this script as I could run it in CC (on the same machine) without issues (the only difference is I've changed the f-slope solver to delay_and_offset).
I'm not exactly sure what information/documents would help debug this issue, but if you let me know what you need to reproduce the error, I can provide it.
(Virtual) Machine specs:
64 Gb, 8 core
Data:
S-band VLA data (i.e., 2 x 8 SPW basebands, each with 512 -- 2 MHz -- channels)
Hi @AKHughes1994! Sorry that you seem to have run into a bug - if it seems stochastic it may be thread safety related. Could you please share both your log file and your QuartiCal config file/command line?
Hi @JSKenyon I've attached the log file + .yaml file,
The command I run is,
goquartical ../quartical_parsets/DI_bb.yaml input_ms.path=ms.ms input_ms.select_ddids=[8,9,10,11,12,13,14,15] input_ms.freq_chunk=512 K.freq_interval=512
Ok, I can reproduce on an arbitrary dataset which suggests it is a bug in the code and not some peculiarity in the data. Will drill down and find it.
I believe I have found the problem - could you please unset output.subtract_directions
? Please let me know if that works for you, as it seems to resolve the segfaults (due to out of bounds access) for me.
Thanks for the bug report - I will put in a check to ensure this doesn't trouble anyone else.
Edit: Just to clarify, the problem is that the corrected residual code is attempting to subtract direction 1 which doesn't actually exist in this case. This leads to an out-of-bounds access which may or may not cause a segfault. The solution is to check that all values in output.subtract_directions
correspond to real directions. This can be done in the dask layer of the residual computation.
Ahhhhh!
I modified a DD yaml file into a DI yaml file and absolutely should have caught that issue. Apologies.
Thanks for finding it,
Andrew