ESGF/esg-search

download script: problems using version in the download_structure

Opened this issue · 1 comments

First thanks for your great work providing the download script. Is has saved me a lot of time.

I am trying to use the download scripts reproducing the DRKZ file structure for usage with the ESMValTool.
Example URL:
https://esgf-data.dkrz.de/esg-search/wget?mip_era=CMIP6&variable=od440aer&experiment_id=historical&variant_label=r1i1p1f1&realm=aerosol,atmos,atmosChem,ocean,seaIce&frequency=mon,fx&source_id=IPSL-CM6A-LR&limit=9999&download_structure=project,institution_id,experiment_id,source_id,variant_label,table_id,grid_label,version

The same as search:
https://esgf-data.dkrz.de/esg-search/search?project=CMIP6&variable=od440aer&experiment_id=historical&variant_label=r1i1p1f1&realm=aerosol,atmos,atmosChem,ocean,seaIce&frequency=mon,fx&source_id=IPSL-CM6A-LR&format=application%2Fsolr%2Bjson&fields=master_id,version,timestamp
gives the correct version information (truncated):

      {
        "version":"20180803",
        "master_id":"CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.AERmon.od440aer.gr",
        "score":1.0},
      {
        "version":"20180803",
        "master_id":"CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.AERmon.od440aer.gr",
        "score":1.0}]
  },

Unfortunately, the version the download script gets is just '1' instead of '20180803':

Saving to: ‘CMIP6/IPSL/historical/IPSL-CM6A-LR/r1i1p1f1/AERmon/gr/1/od440aer_AERmon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc’

The interesting thing is that it's not always wrong:

https://esgf-data.dkrz.de/esg-search/wget?mip_era=CMIP6&variable=od440aer&experiment_id=historical&variant_label=r1i1p1f1&realm=aerosol,atmos,atmosChem,ocean,seaIce&frequency=mon,fx&source_id=TaiESM1&limit=9999&download_structure=project,institution_id,experiment_id,source_id,variant_label,table_id,grid_label,version

actually gets the right version from the https download but the wrong version from the http download. (Needs the -i switch to download from https).

Is this a bug? Is there a way to prepend the version with a 'v' (to completely reproduce the DKRZ file naming scheme)?

Thanks for your help.

I'm using this workaround for this problem:
https://gist.github.com/bjoernbroetz/e55430a5f4c529ebe2fc0ad87870a487
It basically makes use of the fact that the version-id is contained in the original path of the file. This information can be then extracted from the wget-script.

But this breaks the mechanism for updates. The wget scripts are looking if the data is there already and then skipping it. After "fixing" the version this is not possible any more for the subsequent runs.

In my next version of this workaround I would modify the wget-script to replace the "1" with the right version number prior to run it.

However, I agree with @jgriesfeller and think we would need a real fix on this. Or maybe someone can tell us how to use the API in the right way if we missed something.