Schema Checker field_and_source_xds
Closed this issue · 3 comments
Schema: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit?usp=sharing
- Merge in #200
- Update to SOURCE_LOCATION
- Fix schema checker for field_and_source_xds (will have to iterate through all visibility data variables using data_group attribute).
- Add documentation for field_and_source_xds.
- Get stakeholder tests to pass.
- #226
While doing this, figure out ways to check constraints like:
- Alternative data variables (one or the other must be present), such as FIELD_PHASE_CENTER or FIELD_REFERENCE_CENTER
- When optional dimensions are present, they must be present in all the relevant data variables (for example, in the field_and_source_xds, time is present in FIELD_PHASE_CENTER and SOURCE_LOCATION and LINE_REST_FREQUENCY and...
This branch now passes the stakeholder tests with the checker re-enabled, with caveats (a couple of downloaded datasets that would need an update/discussion)...
@Jan-Willem : I have disabled the test_alma_pre-converted (Antennae_North.cal.lsrk.split.vis.zarr
) temporarily in this branch, as it is outdated. It has schema validation issues, starting from:
* Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['type']: Required attribute type is missing! (expected: <class 'str'> found: None)
* Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['units']: Required attribute units is missing! (expected: list[str] found: None)
* Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['scale']: Required attribute scale is missing! (expected: <class 'str'> found: None)
* Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['format']: Required attribute format is missing! (expected: <class 'str'> found: None)
This is one of several issues with missing metadata that are now fixed in this branch.
This Antennae_North.cal.lsrk.split.vis.zarr
would need to be re-converted using this branch before we can re-enable test_preconverted_alma
.
In addition to this, the s3 dataset would also need to be updated to this (xds.attrs["type"] = "visibility") and more changes that have been merged in since the dataset was generated and uploaded:
https://github.com/casangi/xradio/pull/200/files#diff-fcb3d24497bd2d070aa1622753c4663726de6147b87de9fa77e76ab90c0fdb68R756
I've disabled that check in the schema checker in this branch for now.
This branch now has merged the latest main and the last few commits from #200. I'm still doing some checks, as the merge was complicated but so far so good.
The schema docs produced has an updated version of field_and_source_xds, and now also some placeholders for xds's that were missing.
The stk test that were disabled temporarily are now re-enabled. On a second thought, rather than disabling some tests that depend on pre-generated data, or re-generating and re-uploading them, I've introduced an option to base_test
, do_check_schema
. That option is enabled by default but for now disabled in test_s3
and test_preconverted_alma
. This should be enough to avoid having to re-upload datasets every time a PR updates the schema and/or the converter introducing some mismatch wrt pre-generated datasets.
@FedeMPouzols, disabling the schema checker for test_s3 and test_preconverted_alma is fine. After you have done your checks please open a pull request. Thanks.