EC-Earth/cmor-fixer

Option --meta to adjust metadata fails

Closed this issue · 21 comments

While trying the --meta option, which failed, I wondered whether

 metadata = None

at line 120 of cmor-fixer.py allows the activation of the option.

I run:

./cmor-fixer.py --verbose --forceid --meta test-meta.json --olist --npp 1 --dry ~/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hus/gr/v20191121/

which gave the error:

Traceback (most recent call last):
  File "./cmor-fixer.py", line 163, in <module>
    main()
  File "./cmor-fixer.py", line 139, in main
    modified = worker(ncfile)
  File "./cmor-fixer.py", line 62, in fix_file
    for key, val in metadata:
ValueError: too many values to unpack (expected 2)

I will test the fix 7545eb5 to this problem now.

When I apply the following changes to the test-meta.json

Previous:
<     "branch_time_in_child":         "0.0D",
<     "branch_time_in_parent":        "0.0D",
---
New:
>     "branch_time_in_child":         "109870.0D",
>     "branch_time_in_parent":        "104320.0D",

Then the --dry run goes fine. But when really applying the changes it get the error:

Traceback (most recent call last):
  File "./cmor-fixer.py", line 164, in <module>
    main()
  File "./cmor-fixer.py", line 140, in main
    modified = worker(ncfile)
  File "./cmor-fixer.py", line 66, in fix_file
    setattr(ds, key, val)
  File "netCDF4/_netCDF4.pyx", line 2914, in netCDF4._netCDF4.Dataset.__setattr__
  File "netCDF4/_netCDF4.pyx", line 2844, in netCDF4._netCDF4.Dataset.setncattr
  File "netCDF4/_netCDF4.pyx", line 1606, in netCDF4._netCDF4._set_att
  File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
AttributeError: NetCDF: Name contains illegal characters
goord commented

Hi @treerink could you also post the last log message before that error?

./cmor-fixer.py --verbose --forceid --meta test-meta.json --olist --npp 1 ~/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hus/gr/v20191121/
2020-01-10 12:10:47 INFO:cmor-fixer.py: Setting metadata field #variant_info to uncomment and describe briefly what makes your ripf identifier unique (only needed if i/=1, p/=1 or f/=1) in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hus/gr/v20191121/hus_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
Traceback (most recent call last):
  File "./cmor-fixer.py", line 164, in <module>
    main()
  File "./cmor-fixer.py", line 140, in main
    modified = worker(ncfile)
  File "./cmor-fixer.py", line 66, in fix_file
    setattr(ds, key, val)
  File "netCDF4/_netCDF4.pyx", line 2914, in netCDF4._netCDF4.Dataset.__setattr__
  File "netCDF4/_netCDF4.pyx", line 2844, in netCDF4._netCDF4.Dataset.setncattr
  File "netCDF4/_netCDF4.pyx", line 1606, in netCDF4._netCDF4._set_att
  File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
AttributeError: NetCDF: Name contains illegal characters
goord commented

Tested locally and on my system the branch_time modifications seem to work

goord commented

Aha ok it is the #variant_info which I need to skip because it is a 'comment'....

goord commented

Done. In any case it is slightly more effcient to keep only attributes in your metadata file that actually need to be changed. But it should work now.

Yes this goes better, the tested attribute is changed now, and no error is reported.

However when I compare (with meld) the results of ncdump -h of the pre-corrected and the corrected files I see some suspicious differences: Could it be that an attribute with content which has new lines in it is not correctly handled. And maybe the same for fields with integer values? Because all those lines seem to miss in the header of the corrected file.

goord commented

You mean those attributes have just disappeared?

Yes

That means for the first category: the rest of the attribute content has disappeared after the first line, so the first line is there.

goord commented

Ok could you test this once more Thomas?

Tested: little improvement, the fields with integer values behave correct now.

However, the fields with new lines like source contain still only the first line, the rest vanished.

Further I see some extra fields popping up like comment, maybe they were already there in the previous test and I missed them at that time.

I include a screenshot of a meld diff (left: new, right the previous non-fixed):

meta-option-debug-meld-screenshot

Of course not all is visible, but the problem is visible. The branch_time_in_childand branch_time_in_parent were (in green) higher up in the right previous one.

The log shows it as well:

INFO:cmor-fixer.py: Setting metadata field source to EC-Earth3 (2019) in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field branch_time_in_child to 109870.0D in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field branch_time_in_parent to 104320.0D in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field comment to  in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field _control_vocabulary_file to CMIP6_CV.json in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field _cmip6_option to CMIP6 in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting metadata field tracking_prefix to hdl:21.14100 in /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Setting tracking_id to hdl:21.14100/9d3e9330-3837-47ed-9742-0998e25fb8a9 for /nfs/home/users/reerink/l2/cmorised-results/cmor-cmip-test-all-11/t002/ifs/001/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/piControl/r1i1p1f1/Amon/hur/gr/v20191121/hur_Amon_EC-Earth3_piControl_r1i1p1f1_gr_199001-199012.nc
INFO:cmor-fixer.py: Appending message about modification to the history attribute.

So the additional ones are: comment, _control_vocabulary_file, _cmip6_option, tracking_prefix

I will test a case in which I only include the corrected branch_time_in_childand branch_time_in_parent which I here want to change, and omit all other fields from the test-meta.json.

See update of #14 (comment).

Also when I take test-meta-minimal.json which looks like:

{
    "branch_time_in_child":         "109870.0D",
    "branch_time_in_parent":        "104320.0D"
}

so only contains the attributes to be changed, I get the same issues as above.

goord commented

I tested with your minimal json file above and for me it works, the source attribute is not modified and neither are the integer-valued attributes.

The problem here is that our metadata is not a valid json file! We need to use \\n for newline characters in the source field instead of \n. Now, the source is read to be EC-Earth3 (2019) \n and then the python file reader moves to the next field because it encounters a true newline character.

This is very hard to fix in the cmor-fixer script, shall I just skip the source field alltogether? Or do we instruct users to minimize their json files and not use \n in the attribute values?

Ok you are right for this example:

{
    "branch_time_in_child":         "109870.0D",
    "branch_time_in_parent":        "104320.0D"
}

in which all other fields are left out it works properly. (Earlier, I probably made the wrong test by applying it on the same file as the failed test with all the fields in the json, and then the deviating stuff is already in).

So indeed this allows the possibility that the --meta is used with a json file in which only the attributes are included which have to be changed.

However, for the case all fields are included in the json: What about these added ones: comment, _control_vocabulary_file, _cmip6_option, tracking_prefix? When this is solved and we just skip in any case the source one as you suggested because of the \n issue, then also this will work (which is maybe safe if achievable). The source field won't be changed/corrected in practice I think.

Is it easy/feasible to prevent writing these comment, _control_vocabulary_file, _cmip6_option, tracking_prefix fields in the corrected file?

goord commented

Sure, I can adopt a list of attributes that shouldn't be added. I can also prevent the tool from adding new attributes at all.

goord commented

Hi Thomas I pushed a commit that should solve the issue above, I added the option --addattrs to be able to add extra attributes or not. comment and source are skipped in all cases.

Yes, we are there. Thanks.

Though I don't expect for source anyone to correct, for comment this could be different. Anyway, if that happens, we have to see. For now I think it is good enough.