Option --meta to adjust metadata fails
Closed this issue · 21 comments
While trying the --meta option, which failed, I wondered whether
metadata = None
at line 120 of cmor-fixer.py
allows the activation of the option.
I run:
./cmor-fixer.py --verbose --forceid --meta test-meta.json --olist --npp 1 --dry ~/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hus/gr/v20191121/
which gave the error:
Traceback (most recent call last):
File "./cmor-fixer.py", line 163, in <module>
main()
File "./cmor-fixer.py", line 139, in main
modified = worker(ncfile)
File "./cmor-fixer.py", line 62, in fix_file
for key, val in metadata:
ValueError: too many values to unpack (expected 2)
When I apply the following changes to the test-meta.json
Previous:
< "branch_time_in_child": "0.0D",
< "branch_time_in_parent": "0.0D",
---
New:
> "branch_time_in_child": "109870.0D",
> "branch_time_in_parent": "104320.0D",
Then the --dry
run goes fine. But when really applying the changes it get the error:
Traceback (most recent call last):
File "./cmor-fixer.py", line 164, in <module>
main()
File "./cmor-fixer.py", line 140, in main
modified = worker(ncfile)
File "./cmor-fixer.py", line 66, in fix_file
setattr(ds, key, val)
File "netCDF4/_netCDF4.pyx", line 2914, in netCDF4._netCDF4.Dataset.__setattr__
File "netCDF4/_netCDF4.pyx", line 2844, in netCDF4._netCDF4.Dataset.setncattr
File "netCDF4/_netCDF4.pyx", line 1606, in netCDF4._netCDF4._set_att
File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
AttributeError: NetCDF: Name contains illegal characters
./cmor-fixer.py --verbose --forceid --meta test-meta.json --olist --npp 1 ~/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hus/gr/v20191121/
2020-01-10 12:10:47 INFO:cmor-fixer.py: Setting metadata field #variant_info to uncomment and describe briefly what makes your ripf identifier unique (only needed if i/=1, p/=1 or f/=1) in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hus/gr/v20191121/hus_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
Traceback (most recent call last):
File "./cmor-fixer.py", line 164, in <module>
main()
File "./cmor-fixer.py", line 140, in main
modified = worker(ncfile)
File "./cmor-fixer.py", line 66, in fix_file
setattr(ds, key, val)
File "netCDF4/_netCDF4.pyx", line 2914, in netCDF4._netCDF4.Dataset.__setattr__
File "netCDF4/_netCDF4.pyx", line 2844, in netCDF4._netCDF4.Dataset.setncattr
File "netCDF4/_netCDF4.pyx", line 1606, in netCDF4._netCDF4._set_att
File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
AttributeError: NetCDF: Name contains illegal characters
Tested locally and on my system the branch_time modifications seem to work
Aha ok it is the #variant_info
which I need to skip because it is a 'comment'....
Done. In any case it is slightly more effcient to keep only attributes in your metadata file that actually need to be changed. But it should work now.
Yes this goes better, the tested attribute is changed now, and no error is reported.
However when I compare (with meld) the results of ncdump -h
of the pre-corrected and the corrected files I see some suspicious differences: Could it be that an attribute with content which has new lines in it is not correctly handled. And maybe the same for fields with integer values? Because all those lines seem to miss in the header of the corrected file.
You mean those attributes have just disappeared?
Yes
That means for the first category: the rest of the attribute content has disappeared after the first line, so the first line is there.
Ok could you test this once more Thomas?
Tested: little improvement, the fields with integer values behave correct now.
However, the fields with new lines like source
contain still only the first line, the rest vanished.
Further I see some extra fields popping up like comment
, maybe they were already there in the previous test and I missed them at that time.
I include a screenshot of a meld diff (left: new, right the previous non-fixed):
Of course not all is visible, but the problem is visible. The branch_time_in_child
and branch_time_in_parent
were (in green) higher up in the right previous one.
The log shows it as well:
INFO:cmor-fixer.py: Setting metadata field source to EC-Earth3 (2019) in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field branch_time_in_child to 109870.0D in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field branch_time_in_parent to 104320.0D in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field comment to in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field _control_vocabulary_file to CMIP6_CV.json in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field _cmip6_option to CMIP6 in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field tracking_prefix to hdl:21.14100 in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting tracking_id to hdl:21.14100/9d3e9330-3837-47ed-9742-0998e25fb8a9 for /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Appending message about modification to the history attribute.
So the additional ones are: comment,
_control_vocabulary_file,
_cmip6_option,
tracking_prefix
I will test a case in which I only include the corrected branch_time_in_childand
branch_time_in_parent
which I here want to change, and omit all other fields from the test-meta.json
.
See update of #14 (comment).
Also when I take test-meta-minimal.json
which looks like:
{
"branch_time_in_child": "109870.0D",
"branch_time_in_parent": "104320.0D"
}
so only contains the attributes to be changed, I get the same issues as above.
I tested with your minimal json file above and for me it works, the source attribute is not modified and neither are the integer-valued attributes.
The problem here is that our metadata is not a valid json file! We need to use \\n
for newline characters in the source field instead of \n
. Now, the source is read to be EC-Earth3 (2019) \n
and then the python file reader moves to the next field because it encounters a true newline character.
This is very hard to fix in the cmor-fixer script, shall I just skip the source field alltogether? Or do we instruct users to minimize their json files and not use \n
in the attribute values?
Ok you are right for this example:
{
"branch_time_in_child": "109870.0D",
"branch_time_in_parent": "104320.0D"
}
in which all other fields are left out it works properly. (Earlier, I probably made the wrong test by applying it on the same file as the failed test with all the fields in the json, and then the deviating stuff is already in).
So indeed this allows the possibility that the --meta
is used with a json file in which only the attributes are included which have to be changed.
However, for the case all fields are included in the json: What about these added ones: comment,
_control_vocabulary_file,
_cmip6_option,
tracking_prefix
? When this is solved and we just skip in any case the source
one as you suggested because of the \n
issue, then also this will work (which is maybe safe if achievable). The source
field won't be changed/corrected in practice I think.
Is it easy/feasible to prevent writing these comment,
_control_vocabulary_file,
_cmip6_option,
tracking_prefix
fields in the corrected file?
Sure, I can adopt a list of attributes that shouldn't be added. I can also prevent the tool from adding new attributes at all.
Hi Thomas I pushed a commit that should solve the issue above, I added the option --addattrs
to be able to add extra attributes or not. comment
and source
are skipped in all cases.
Yes, we are there. Thanks.
Though I don't expect for source
anyone to correct, for comment
this could be different. Anyway, if that happens, we have to see. For now I think it is good enough.