Problem building dataset from NetCDF files
jbanomedina opened this issue ยท 2 comments
What happened?
My goal is to build a dataset from NetCDF files using the anemoi-datasets library. However, I get an error when using NetCDF files as the source. I have tried both version 0.4.0 (installed using pip
) and the develop branch (installed by cloning the repository). I was able to successfully build a dataset from a grib file, however for my project I have the data on the NetCDF format.
What are the steps to reproduce the bug?
Code needed to reproduce this error is the following.
1) First, I download a sample NetCDF file from the CDS using a python script.
import cdsapi
## Define parameters
vars=['10m_u_component_of_wind', '10m_v_component_of_wind']
year=2013
###
c=cdsapi.Client()
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'format': 'netcdf',
'variable': vars,
'year': year,
'month': [
'01',
],
'day': [
'01', '02',
],
'time': [
'00:00', '06:00', '12:00', '18:00',
],
},
'./sample.nc')
2) Second, I point to this sample in the recipe.yaml
file.
dates:
start: 2013-01-01T00:00:00
end: 2013-01-01T06:00:00
frequency: 6h
input:
netcdf:
path: ./sample.nc
param: [u10,v10] # I tried also [10u,10v]
levtype: sfc
- Type this in the command line:
anemoi-datasets create recipe.yaml dataset.zarr
Version
v0.4.0
Platform (OS and architecture)
Linux exp-18-17 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Apr 4 18:13:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Relevant log output
Setting flatten_grid=True in config
Setting ensemble_dimension=2 in config
Setting flatten_grid=True in config
Setting ensemble_dimension=2 in config
2024-07-16 14:42:59 INFO {'start': datetime.datetime(2013, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2013, 1, 1, 6, 0, tzinfo=datetime.timezone.utc), 'frequency': '6h', 'group_by': 'monthly'}
2024-07-16 14:42:59 INFO <anemoi.datasets.dates.groups.Groups object at 0x155147fbcee0>
2024-07-16 14:42:59 INFO โ
INPUT_BUILDER
2024-07-16 14:42:59 INFO FunctionAction: path=./sample.nc param=['u10', 'v10'] levtype=sfc
2024-07-16 14:42:59 INFO FunctionAction: path=./sample.nc param=['u10', 'v10'] levtype=sfc
2024-07-16 14:42:59 INFO Minimal input (using only the first date) :
2024-07-16 14:42:59 INFO netcdf(['2013-01-01T00:00:00'])
Config loaded ok:
2024-07-16 14:42:59 INFO {'config_path': '/expanse/nfs/cw3e/cwp167/projects/test-attribution/recipe.yaml', 'dates': {'start': datetime.datetime(2013, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2013, 1, 1, 6, 0, tzinfo=datetime.timezone.utc), 'frequency': '6h', 'group_by': 'monthly'}, 'input': {'netcdf': {'path': './sample.nc', 'param': ['u10', 'v10'], 'levtype': 'sfc'}}, 'dataset_status': 'experimental', 'description': 'No description provided.', 'licence': 'unknown', 'attribution': 'unknown', 'build': {'group_by': 'monthly', 'use_grib_paramid': False, 'variable_naming': 'default'}, 'output': {'order_by': {'valid_datetime': 'ascending', 'param_level': 'ascending', 'number': 'ascending'}, 'remapping': {'param_level': '{param}_{levelist}'}, 'statistics': 'param_level', 'chunking': {'dates': 1, 'ensembles': 1}, 'dtype': 'float32', 'flatten_grid': True, 'ensemble_dimension': 2}, 'statistics': {}, 'reading_chunks': None}
Found 2 datetimes.
2024-07-16 14:42:59 INFO Dates: Found 2 datetimes, in 1 groups:
2024-07-16 14:42:59 INFO Missing dates: 0
Found 2 datetimes 2.
2024-07-16 14:43:00 INFO Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-07-16 14:43:00 INFO Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-07-16 14:43:00 INFO NumExpr defaulting to 8 threads.
2024-07-16 14:43:00 ERROR Error in execute
Traceback (most recent call last):
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 433, in datasource
return self.action.function(FunctionContext(self), self.dates, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 72, in execute
return load_netcdfs("๐", "path", context, dates, path, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 66, in load_netcdfs
check(what, ds, given_paths, valid_datetime=dates, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 40, in check
raise ValueError(f"Expected {count} fields, got {len(ds)} (kwargs={kwargs}, {what}s={paths})")
ValueError: Expected 2 fields, got 0 (kwargs={'valid_datetime': ['2013-01-01T00:00:00'], 'param': ['u10', 'v10'], 'levtype': 'sfc'}, paths=['./sample.nc'])
Traceback (most recent call last):
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/utils/cli.py", line 128, in cli_main
cmd.run(args)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/commands/create.py", line 30, in run
c.create()
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/__init__.py", line 153, in create
self.init()
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/__init__.py", line 50, in init
obj.initialise(check_name=check_name)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/loaders.py", line 271, in initialise
variables = self.minimal_input.variables
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 227, in variables
return self._coords.variables
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 190, in variables
self._build_coords
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 143, in _build_coords
from_data = self.owner.get_cube().user_coords
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 350, in get_cube
ds = self.datasource
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 81, in wrapper
result = method(self, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/template.py", line 82, in wrapper
result = method(self, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/template.py", line 42, in wrapper
result = method(self, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 433, in datasource
return self.action.function(FunctionContext(self), self.dates, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 72, in execute
return load_netcdfs("๐", "path", context, dates, path, *args, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 66, in load_netcdfs
check(what, ds, given_paths, valid_datetime=dates, **kwargs)
File "/expanse/nfs/cw3e/cwp167/envs/nwm-anemoi/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/netcdf.py", line 40, in check
raise ValueError(f"Expected {count} fields, got {len(ds)} (kwargs={kwargs}, {what}s={paths})")
ValueError: Expected 2 fields, got 0 (kwargs={'valid_datetime': ['2013-01-01T00:00:00'], 'param': ['u10', 'v10'], 'levtype': 'sfc'}, paths=['./sample.nc'])
2024-07-16 14:43:00 ERROR
๐ฃ Expected 2 fields, got 0 (kwargs={'valid_datetime': ['2013-01-01T00:00:00'], 'param': ['u10', 'v10'], 'levtype': 'sfc'}, paths=['./sample.nc'])
2024-07-16 14:43:00 ERROR ๐ฃ Exiting
Accompanying data
No response
Organisation
No response