casangi/xradio

read_generic_table not working on columns with empty cells.

Closed this issue · 2 comments

An example dataset can be obtained using:

import graphviper
graphviper.utils.data.download(file="ALMA_uid___A002_X1003af4_X75a3.split.avg.ms")

The loaded source_xds does not contain the transition and rest frequency information.

from xradio.vis._vis_utils._ms._tables.read import (
    read_generic_table,
)
source_xds = read_generic_table('ALMA_uid___A002_X1003af4_X75a3.split.avg.ms','SOURCE')
source_xds
tablebrowser xradio_read_generic_table

I think that the problem is that the filter that we've traditionally used against unloadable and/or unfilled columns is too strict and weak:

and (tb_tool.iscelldefined(col, 0))

If the first cell is empty then it is assumed that the column will be empty, or not worth/safe to load.

Also we do not really need the value of that or any particular cell, just the column type.

The branch of this issue has a fix that should remove this limitation. With this fix, as long as a column is defined (and is not of unsupported/troublesome type such as 'record') it will be loaded.
The values for "empty cells" (in the casacore sense) will be empty. This way we are saying "casacore empty cell" = "cell has empty array", while previously (a first) empty cell was interpreted as "not a column to load safely".

The fix will load columns like the example ones, regardless of whether the cells are empty (in the sense of "iscelldefined() == False").
That should prevent missing columns and let other work continue. But there might be additional nuances to discuss.
In the ALMA example given in the description of this issue, the variables SYSVEL, DIRECTION and POSITION were also missing.
From these, POSITION is an example of the extreme case where the column is defined, but all the cells are left empty (iscelldefined(...) == False). Such cases will produce data variables that do not have any effective values.
Screenshot_2024-07-18_11-37-12