ratt-ru/meqtrees-timba

Unexpected thread state when solving des

Opened this issue · 5 comments

376.37 25.0Gb gainopts(StefCal.py:905:get_result): checking flagging
376.40 25.0Gb gainopts(StefCal.py:918:get_result): 0.00% (0/10654800) data points were flagged in the stefcal process. Can take.
376.40 25.0Gb gainopts(StefCal.py:1009:get_result): computing result
378.32 25.7Gb gainopts(StefCal.py:1081:get_result): computing result: done
378.32 25.7Gb gainopts(StefCal.py:1117:get_result): ev.0.0.0.2.1 elapsed time 0m4.25s
terminate called after throwing an instance of 'LOFAR::Exception'
  what():  unexpected thread state in getWorkOrder()
Traceback (most recent call last):
  File "/usr/bin/meqtree-pipeliner.py", line 176, in <module>
    res = func(mqs,None,wait=True);
  File "/usr/lib/python2.7/dist-packages/Cattery/Calico/calico-stefcal.py", line 381, in _run_stefcal
    mqs.execute('VisDataMux',mssel.create_io_request(),wait=wait);
  File "/usr/lib/python2.7/dist-packages/Timba/Apps/meqserver.py", line 173, in execute
    return self.meq('Node.Execute',rec,wait=wait);
  File "/usr/lib/python2.7/dist-packages/Timba/Apps/meqserver.py", line 126, in meq
    msg = self.await(replyname,resume=True,timeout=wait);
  File "/usr/lib/python2.7/dist-packages/Timba/Apps/multiapp_proxy.py", line 524, in await
    raise RuntimeError,"lost all connections while waiting for event "+str(what);
RuntimeError: lost all connections while waiting for event Result.Node.execute.1
/home/hugo/output/COMBINED.J1638.2-6420.1GC-J1638.2-6420.diffgain.cp does not exist, so not trying to remove

Parset options:

244 def decalibrate(incol="SUBTRACTED_DATA", 
245               calincol="CORRECTED_DATA",
246               outcol="SUBTRACTED_DATA", 
247               model="MODEL_WITHOUT_DES",
248               lsmfilepostfix="decal1", 
249               des="{0:s}-catalog.lsm.html.de_tagged.lsm.html",
250               label='decal',
251               freq_int=[16, 64],
252               masksig=[45, 45, 45],
253               solvemode='Gain2x2',
254               corrtype='CORR_DATA_SUB',#'sr',
255               interval=[40, 80, 80],
256               restore=None):
....

284         recipe.add("cab/calibrator", "calibrate_target_%d" % ti, {
285             'msname': "%s.%s.1GC.ms" % (PREFIX, t),
286             'column': calincol,
287             'tile-size': 120,
288             'make-plots': True,
289             'skymodel': "{0:s}:output".format(des).format(f),
290             ##'model-column': 'MODEL_WITHOUT_DES',
291             'Ejones': True,
292             'beam-files-pattern': "MeerKAT_VBeam_10MHz_53Chans_$(xy)_$(reim).fits",
293             'beam-l-axis' : "X",
294             'beam-m-axis' : "Y",
295             'parallactic-angle-rotation': True,
296             'write-flagset': "cubical",
297             'read-legacy-flags': True,
298             'fill-legacy-flags': False,
299             'save-config': "{0:s}.tdl".format(t),
300             'label': t,
301             'prefix': t,
302             'make-plots': True,
303             'output-data': corrtype,
304             'output-column': outcol,
305             'DDjones': True,
306             'DDjones-tag': 'dE',
307             'DDjones-solution-intervals': [interval[ti], freq_int[ti]],
308             'DDjones-smoothing-intervals':  [interval[ti] * 5, freq_int[ti] * 5],
309             'DDjones-matrix-type': solvemode,
310             'DDjones-niter': 1000,
311             'DDjones-chisq-clipping': True,
312             'threads': 64,
313             'DDjones-ampl-clipping': True,
314             'DDjones-ampl-clipping-high': 1.2,
315             'DDjones-ampl-clipping-low': 0,
316             'DDjones-niter': 1000,
317             'save-config': "{0:s}.tdl".format(t)

@o-smirnov any ideas?

I think some tiles are already fully flagged

Hmm no it looks like something more insidious:
Running in serial it works...

956.77 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.77 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.77 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.78 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.78 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.78 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.78 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.79 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.79 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.80 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.80 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.80 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.80 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.81 6.1Gb gainopts(GainOpts.py:287:resolve_tilings): datashape (31, 1366) expanded datashape is (31, 1366)
956.81 6.1Gb gainopts(GainOpts.py:295:resolve_tilings): based on an LCM tiling of [31, 1366]
956.82 6.1Gb gainopts(StefCal.py:484:get_result): constructed internal arrays, trying to release array memory
956.84 6.1Gb gainopts(StefCal.py:487:get_result): released memory
956.84 6.1Gb gainopts(StefCal.py:492:get_result): no valid data found for solvable IFRs  -- nothing to stefcal!
/home/hugo/output/COMBINED.J1638.2-6420.1GC-J1638.2-6420.diffgain.cp does not exist, so not trying to remove
### Job result: None
### No more commands
### Stopping the meqserver
### All your batch are belong to us. Bye!