ecmwf/anemoi-datasets

Problem building dataset from NetCDF files

jbanomedina opened this issue ยท 2 comments

What happened?

My goal is to build a dataset from NetCDF files using the anemoi-datasets library. However, I get an error when using NetCDF files as the source. I have tried both version 0.4.0 (installed using pip) and the develop branch (installed by cloning the repository). I was able to successfully build a dataset from a grib file, however for my project I have the data on the NetCDF format.

What are the steps to reproduce the bug?

Code needed to reproduce this error is the following.
1) First, I download a sample NetCDF file from the CDS using a python script.

import cdsapi
## Define parameters
vars=['10m_u_component_of_wind', '10m_v_component_of_wind']
year=2013
###
c=cdsapi.Client()
c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'format': 'netcdf',
        'variable': vars,
        'year': year,
        'month': [
            '01',
        ],
        'day': [
            '01', '02',
        ],
        'time': [
            '00:00', '06:00', '12:00', '18:00',
        ],
    },
    './sample.nc')

2) Second, I point to this sample in the recipe.yaml file.

dates:
  start: 2013-01-01T00:00:00
  end: 2013-01-01T06:00:00
  frequency: 6h
input:
  netcdf:
    path: ./sample.nc
    param: [u10,v10] # I tried also [10u,10v] 
    levtype: sfc
  1. Type this in the command line:
anemoi-datasets create recipe.yaml dataset.zarr

Version

v0.4.0

Platform (OS and architecture)

Linux exp-18-17 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Apr 4 18:13:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Relevant log output

Setting flatten_grid=True in config
Setting ensemble_dimension=2 in config
Setting flatten_grid=True in config
Setting ensemble_dimension=2 in config
2024-07-16 14:42:59 INFO {'start': datetime.datetime(2013, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2013, 1, 1, 6, 0, tzinfo=datetime.timezone.utc), 'frequency': '6h', 'group_by': 'monthly'}
2024-07-16 14:42:59 INFO <anemoi.datasets.dates.groups.Groups object at 0x155147fbcee0>
2024-07-16 14:42:59 INFO โœ… INPUT_BUILDER
2024-07-16 14:42:59 INFO FunctionAction: path=./sample.nc param=['u10', 'v10'] levtype=sfc 
2024-07-16 14:42:59 INFO FunctionAction: path=./sample.nc param=['u10', 'v10'] levtype=sfc 
2024-07-16 14:42:59 INFO Minimal input (using only the first date) :
2024-07-16 14:42:59 INFO netcdf(['2013-01-01T00:00:00'])
Config loaded ok:
2024-07-16 14:42:59 INFO {'config_path': '/expanse/nfs/cw3e/cwp167/projects/test-attribution/recipe.yaml', 'dates': {'start': datetime.datetime(2013, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2013, 1, 1, 6, 0, tzinfo=datetime.timezone.utc), 'frequency': '6h', 'group_by': 'monthly'}, 'input': {'netcdf': {'path': './sample.nc', 'param': ['u10', 'v10'], 'levtype': 'sfc'}}, 'dataset_status': 'experimental', 'description': 'No description provided.', 'licence': 'unknown', 'attribution': 'unknown', 'build': {'group_by': 'monthly', 'use_grib_paramid': False, 'variable_naming': 'default'}, 'output': {'order_by': {'valid_datetime': 'ascending', 'param_level': 'ascending', 'number': 'ascending'}, 'remapping': {'param_level': '{param}_{levelist}'}, 'statistics': 'param_level', 'chunking': {'dates': 1, 'ensembles': 1}, 'dtype': 'float32', 'flatten_grid': True, 'ensemble_dimension': 2}, 'statistics': {}, 'reading_chunks': None}
Found 2 datetimes.
2024-07-16 14:42:59 INFO Dates: Found 2 datetimes, in 1 groups: 
2024-07-16 14:42:59 INFO Missing dates: 0
Found 2 datetimes 2.
2024-07-16 14:43:00 INFO Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-07-16 14:43:00 INFO Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-07-16 14:43:00 INFO NumExpr defaulting to 8 threads.
2024-07-16 14:43:00 ERROR Error in execute
Traceback (most recent call last):
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 433, in datasource
    return self.action.function(FunctionContext(self), self.dates, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 72, in execute
    return load_netcdfs("๐Ÿ“", "path", context, dates, path, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 66, in load_netcdfs
    check(what, ds, given_paths, valid_datetime=dates, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 40, in check
    raise ValueError(f"Expected {count} fields, got {len(ds)} (kwargs={kwargs}, {what}s={paths})")
ValueError: Expected 2 fields, got 0 (kwargs={'valid_datetime': ['2013-01-01T00:00:00'], 'param': ['u10', 'v10'], 'levtype': 'sfc'}, paths=['./sample.nc'])
Traceback (most recent call last):
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/utils/cli.py", line 128, in cli_main
    cmd.run(args)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/commands/create.py", line 30, in run
    c.create()
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/__init__.py", line 153, in create
    self.init()
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/__init__.py", line 50, in init
    obj.initialise(check_name=check_name)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/loaders.py", line 271, in initialise
    variables = self.minimal_input.variables
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 227, in variables
    return self._coords.variables
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 190, in variables
    self._build_coords
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 143, in _build_coords
    from_data = self.owner.get_cube().user_coords
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 350, in get_cube
    ds = self.datasource
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 81, in wrapper
    result = method(self, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/template.py", line 82, in wrapper
    result = method(self, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/template.py", line 42, in wrapper
    result = method(self, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 433, in datasource
    return self.action.function(FunctionContext(self), self.dates, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 72, in execute
    return load_netcdfs("๐Ÿ“", "path", context, dates, path, *args, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 66, in load_netcdfs
    check(what, ds, given_paths, valid_datetime=dates, **kwargs)
  File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 40, in check
    raise ValueError(f"Expected {count} fields, got {len(ds)} (kwargs={kwargs}, {what}s={paths})")
ValueError: Expected 2 fields, got 0 (kwargs={'valid_datetime': ['2013-01-01T00:00:00'], 'param': ['u10', 'v10'], 'levtype': 'sfc'}, paths=['./sample.nc'])
2024-07-16 14:43:00 ERROR 
๐Ÿ’ฃ Expected 2 fields, got 0 (kwargs={'valid_datetime': ['2013-01-01T00:00:00'], 'param': ['u10', 'v10'], 'levtype': 'sfc'}, paths=['./sample.nc'])
2024-07-16 14:43:00 ERROR ๐Ÿ’ฃ Exiting

Accompanying data

No response

Organisation

No response