casangi/xradio

processing_set summary fails when xds has empty data_groups attribute

Closed this issue · 6 comments

xradio version: 0.0.38

in_file = '/lustre/aoc/sciops/pford/3C129.ms'
out_file = '/lustre/aoc/sciops/pford/3C129.vis.zarr'
convert_msv2_to_processing_set(in_file=in_file, out_file=out_file)
ps = read_processing_set(out_file)
ps.summary()
Traceback (most recent call last):
  File "/home/groot/casa/github/casagui/3C129.py", line 8, in <module>
    ps.summary()
    ^^^^^^^^^^^
  File "/home/groot/casa/github/xradio/src/xradio/vis/_processing_set.py", line 41, in summary
    self.meta["summary"][data_group] = self._summary(data_group).sort_values(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/groot/casa/github/xradio/src/xradio/vis/_processing_set.py", line 106, in _summary
    if "visibility" in value.attrs["data_groups"][data_group]:
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'base'

ps['3C129_35']

<xarray.Dataset> Size: 4GB
Dimensions:                     (time: 511, baseline_id: 378, frequency: 1024,
                                 polarization: 4, uvw_label: 3)
Coordinates:
    baseline_antenna1_name      (baseline_id) <U8 12kB dask.array<chunksize=(378,), meta=np.ndarray>
    baseline_antenna2_name      (baseline_id) <U8 12kB dask.array<chunksize=(378,), meta=np.ndarray>
  * baseline_id                 (baseline_id) int64 3kB 0 1 2 3 ... 375 376 377
  * frequency                   (frequency) float64 8kB 3.1e+08 ... 3.739e+08
  * polarization                (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
    scan_number                 (time) int64 4kB dask.array<chunksize=(511,), meta=np.ndarray>
  * time                        (time) float64 4kB 1.425e+09 ... 1.425e+09
  * uvw_label                   (uvw_label) <U1 12B 'u' 'v' 'w'
Data variables:
    EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 2MB dask.array<chunksize=(511, 378), meta=np.ndarray>
    FLAG                        (time, baseline_id, frequency, polarization) bool 791MB dask.array<chunksize=(511, 378, 1024, 4), meta=np.ndarray>
    TIME_CENTROID               (time, baseline_id) float64 2MB dask.array<chunksize=(511, 378), meta=np.ndarray>
    UVW                         (time, baseline_id, uvw_label) float64 5MB dask.array<chunksize=(511, 378, 3), meta=np.ndarray>
    WEIGHT                      (time, baseline_id, frequency, polarization) float32 3GB dask.array<chunksize=(511, 378, 1024, 4), meta=np.ndarray>
Attributes:
    data_groups:     {}
    partition_info:  {'field_name': ['3C129_1'], 'line_name': [], 'num_lines'...
    type:            visibility
    antenna_xds:     <xarray.Dataset> Size: 5kB\nDimensions:                (...
    weather_xds:     <xarray.Dataset> Size: 1kB\nDimensions:         (station...

Interesting. The original MS does not have any DATA column?
I see a 3C129 image in casatestdata which might be related or not, but not this MS.

Perhaps we can accommodate this type of dataset (even though it doesn't comply with MSv2 or XRADIO constraints) just by skipping the field_and_source stuff and other attributes of the (missing) VISIBILITY/SPECTRUM vars. But it would be interesting to find out how/who is generating MSs without any data.

There are other places in the code that look for either "DATA" or "SPECTRUM" and will probably fail or misbehave when the input MS doesn't have any data.

I also just realized that conversion is setting the type to visibility, but probably single-dish datasets should have their own type. The type of MSs of this issue would probably need a third different type, if accepted

@pford This is a bit strange. I just ran it using XRADIO 0.0.39 and it worked. Can you please try it again by first deleting 3C129.vis.zarr and then using 0.0.39.

Screenshot 2024-09-05 at 11 26 20 AM

Also failed on my office workstation with 0.0.39. Logged into a cluster node and no issues with 0.0.39. Maybe I need a new workstation!

@pford That is strange. Maybe try creating a new Python 3.11 environment.

Actually, I used my workstation environment on the cluster node. I created a new environment and it failed the same way.

I added code in xradio.conversion.create_data_variables to re-raise the exception, and the traceback when it was loading DATA for the failing partition was:

 Traceback (most recent call last):
  File "/home/groot/casa/github/casagui/convert_3C129.py", line 6, in <module>
    convert_msv2_to_processing_set(in_file=in_file, out_file=out_file)
  File "/home/groot/casa/github/xradio/src/xradio/vis/convert_msv2_to_processing_set.py", line 104, in convert_msv2_to_processing_set
    convert_and_write_partition(
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/conversion.py", line 820, in convert_and_write_partition
    create_data_variables(
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/conversion.py", line 605, in create_data_variables
    raise(e)
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/conversion.py", line 583, in create_data_variables
    read_col_conversion(
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/_tables/read.py", line 1284, in read_col_conversion
    data[tidxs, bidxs] = tb_tool.getcol(col)
                         ^^^^^^^^^^^^^^^^^^^
  File "/home/groot/anaconda3/envs/xradio_env/lib/python3.11/site-packages/casacore/tables/table.py", line 1034, in getcol
    return self._getcol(columnname, startrow, nrow, rowincr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError

I reverted the code then tried convert_msv2_to_processing_set(use_table_iterator=True). DATA still failed and then got this memory error when loading WEIGHT:

Traceback (most recent call last):
  File "/home/groot/casa/github/casagui/convert_3C129.py", line 6, in <module>
    convert_msv2_to_processing_set(in_file=in_file, out_file=out_file, use_table_iter=True)
  File "/home/groot/casa/github/xradio/src/xradio/vis/convert_msv2_to_processing_set.py", line 104, in convert_msv2_to_processing_set
    convert_and_write_partition(
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/conversion.py", line 816, in convert_and_write_partition
    create_data_variables(
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/conversion.py", line 606, in create_data_variables
    xds = get_weight(
          ^^^^^^^^^^^
  File "/home/groot/casa/github/xradio/src/xradio/vis/_vis_utils/_ms/conversion.py", line 641, in get_weight
    np.tile(
  File "/home/groot/anaconda3/envs/xradio_env/lib/python3.11/site-packages/numpy/lib/_shape_base_impl.py", line 1284, in tile
    c = c.reshape(-1, n).repeat(nrep, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 2.95 GiB for an array with shape (197793792, 4) and data type float32

So, two issues:

  • With logger in debug mode, you get the message Could not load column (no column name!) from this line: logger.debug("Could not load column", col), should be ("Could not load column " + col) or maybe str(col) as in this logger message (but I think col is a str)?
  • Whether you get the log message or not, should this just fail silently and go on if DATA cannot be loaded in a partition?