ratt-ru/CubiCal

Cubical Montblanc error

Opened this issue · 14 comments

Hi,

I'm running into an issue with cubical when trying to run a G/dE correction. I have a model in my model column and a tigger sky-model (converted from pyBDSM) of the problem source. I've attached the logs below. The error seems to be from montblanc, but I can't immediately see what is causing the issue.

Appreciate any information on this you could share.

Thanks,
Joe

cubical_log.txt

Hi @joesbright. If it is possible, would you mind uploading your sky model file? I want to check that it isn't something specific to the model.

The second possibility is that looking at the log, I see that the MS has two spectral windows with a single channel in each. I have never run CubiCal for this particular case, so it is possible the problem could stem from that.

Hi @JSKenyon,

See the sky model below. The data are old VLA data, hence the single channel per SPW.

Thanks for the quick response!

bright_source.lsm.html.zip

Great! I will take a look in the morning.

Just a quick update - when I combine the SPWs with mstransform and rerun I get to the same 'future warnings' as in the previous log file, but then simply get a 'illegal instruction (core dumped)' error. This is similar to what @IanHeywood saw in issue #238 but I am not running on IDIA.

Thanks again,
Joe

I cannot seem to reproduce this error. I have tried creating a 2 band MS with a single channel (using simms) in each and predicting using your sky model. I am not quite sure how to help further unless you can share the data? I understand if that is impossible though. I am running in a python3 (3.6.9) virtualenv with a fresh install of all python dependencies, ignoring cached installs. In theory everything should be up-to-date.

Out of interest, where are you running this? Is it on a local laptop/desktop? Or is it on a server somewhere?

pip freeze

absl-py==0.10.0
astLib==0.11.4
astor==0.8.1
astro-kittens==1.4.3
astro-tigger-lsm==1.6.0
astropy==4.0.1.post1
attrdict==2.0.1
attrs==20.2.0
backcall==0.2.0
bleach==1.5.0
Cerberus==1.3.2
configparser==5.0.0
-e git+https://github.com/ratt-ru/CubiCal.git@2dd52a0df04ef506bf89cf9cd6d4e5889bcddc3d#egg=cubical
cycler==0.10.0
decorator==4.4.2
funcsigs==1.0.2
future==0.18.2
gast==0.4.0
grpcio==1.32.0
html5lib==0.9999999
hypercube==0.3.4
importlib-metadata==2.0.0
ipython==7.16.1
ipython-genutils==0.2.0
jedi==0.17.2
kiwisolver==1.2.0
llvmlite==0.34.0
Markdown==3.2.2
matplotlib==2.2.5
montblanc @ git+https://github.com/ska-sa/montblanc.git@547008faa46d5798f682d9d00597351a67f1915e
nose==1.3.7
numba==0.51.2
numpy==1.19.2
parso==0.7.1
pexpect==4.8.0
pickleshare==0.7.5
pkg-resources==0.0.0
prompt-toolkit==3.0.7
protobuf==3.13.0
psutil==5.7.2
ptyprocess==0.6.0
Pygments==2.7.1
pyparsing==2.4.7
python-casacore==3.3.1
python-dateutil==2.8.1
pytz==2020.1
ruamel.yaml==0.16.12
ruamel.yaml.clib==0.2.2
scipy==1.5.2
SharedArray @ git+https://gitlab.com/bennahugo/shared-array.git@dc90bd2855ddcb7c1bbc473d24a1f42c60436be0
six==1.15.0
tabulate==0.8.7
tensorboard==1.8.0
tensorflow==1.8.0
termcolor==1.1.0
traitlets==4.3.3
wcwidth==0.2.5
Werkzeug==1.0.1
zipp==3.2.0

This is running on a server with multiple nodes. I successfully ran on one of the newer nodes via a singularity shell, but I've also attached a pip freeze from the older node where I was having the issues I mentioned previously. Thanks @JSKenyon and @bennahugo for the help.

WARNING: Could not generate requirement for distribution .wh.pip 20.0.2 (/usr/local/lib/python3.6/dist-packages): Parse error at "'.wh.pip='": Expected W:(abcd...)
absl-py==0.9.0
asn1crypto==0.24.0
astLib==0.11.4
astor==0.8.1
astro-kittens==1.4.3
astro-tigger-lsm==1.6.0
astropy==4.0.1.post1
attrdict==2.0.1
attrs==19.3.0
bleach==1.5.0
Cerberus==1.3.2
configparser==5.0.0
corner==2.0.1
cryptography==2.1.4
cubical @ git+https://github.com/ratt-ru/CubiCal.git@e49aff3975d8cc91a29a707c2d60036ad883bc8e
cycler==0.10.0
emcee==3.0.2
funcsigs==1.0.2
future==0.18.2
gast==0.3.3
grpcio==1.29.0
html5lib==0.9999999
hypercube==0.3.4
idna==2.6
importlib-metadata==1.6.0
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.2.0
llvmlite==0.32.1
Markdown==3.2.2
matplotlib==3.2.1
montblanc @ git+https://github.com/ska-sa/montblanc.git@547008faa46d5798f682d9d00597351a67f1915e
nose==1.3.7
numba==0.49.1
numpy==1.18.4
pandas==1.0.3
protobuf==3.12.0
psutil==5.7.0
pycrypto==2.6.1
pygobject==3.26.1
pyparsing==2.4.7
python-apt==1.6.4
python-casacore==3.3.1
python-dateutil==2.8.1
pytz==2020.1
pyxdg==0.25
PyYAML==5.3
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
scipy==1.4.1
SecretStorage==2.3.1
SharedArray @ git+https://gitlab.com/bennahugo/shared-array.git@dc90bd2855ddcb7c1bbc473d24a1f42c60436be0
six==1.14.0
tabulate==0.8.7
tensorboard==1.8.0
tensorflow==1.8.0
termcolor==1.1.0
unattended-upgrades==0.1
Werkzeug==1.0.1
zipp==3.1.0

This is still mysterious, as it was in #238. The fact that it works on one node and not another suggests something system related but I have no instinct for the cause.