google-research/weatherbench2

graphcast missing in dataset

Closed this issue · 8 comments

Hi. Thanks for the great initiative

I see the Graphcast dataset folder is missing in the bucket "weatherbench2/datasets"? @shoyer @raspstephan

Hey, we currently don't have GraphCast forecasts o the public cloud bucket. Hopefully this will change soon.

Okay, thanks for the information. I tried to get predictions from graphcast model but it seems that there are some key mismatches, that I have been trying to resolve. If you have any resource that makes the exact format the weather bench requires, do let me know. Thanks

Could you be more specific about those mismatches? Maybe I can help then.

KeyError: "'init_time' is not a valid dimension or coordinate"

predictions I got did not have time but had time deltas. So I had to insert one coordinate to keep the evaluation running, but it popped out after inserting.

Complete log:

Traceback (most recent call last):
File "////miniconda3/envs/weatherv2/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "////miniconda3/envs/weatherv2/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "////.vscode/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "////.vscode/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "////.vscode/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "////.vscode/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "////.vscode/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "////.vscode/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "////fcfa0911-ffd4-49d3-9003-efe40cfa8f8b/wbench/weatherbench_eval.py", line 312, in
evaluate_in_memory(data_config, eval_configs) # Takes around 5 minutes
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/weatherbench2/evaluation.py", line 496, in evaluate_in_memory
_evaluate_all_metrics(eval_name, eval_config, data_config)
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/weatherbench2/evaluation.py", line 430, in _evaluate_all_metrics
forecast, truth, climatology = open_forecast_and_truth_datasets(
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/weatherbench2/evaluation.py", line 328, in open_forecast_and_truth_datasets
forecast = _impose_data_selection(
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/weatherbench2/evaluation.py", line 152, in _impose_data_selection
dataset = dataset.sel({time_dim: selection.time_slice})
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/xarray/core/dataset.py", line 2794, in sel
query_results = map_index_queries(
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/xarray/core/indexing.py", line 186, in map_index_queries
grouped_indexers = group_indexers_by_index(obj, indexers, options)
File "////miniconda3/envs/weatherv2/lib/python3.10/site-packages/xarray/core/indexing.py", line 150, in group_indexers_by_index
raise KeyError(f"{key!r} is not a valid dimension or coordinate")
KeyError: "'init_time' is not a valid dimension or coordinate"

Can you give some pointers for getting predictions from Graphcast or any other weather prediction model and converting them into the exact format required for weatherbenchv2? A quick reply will be great. @raspstephan

Can you share what the dataset looks like and what command you ran that gave you the error?

image

And this command gives error: "evaluate_in_memory(data_config, eval_configs) # Takes around 5 minutes"

An exact format seems to be an issue like you have for all other methods in the bucket.
The evaluation followed by the required format would solve the issues. I have seen that you have some results (in results folder) on Graphcast that are not flexible to particular regions of selection like the selection config in the weatherbench code.

Hey, your forecast is missing a dimension. The dataset only has a timedelta but it doesn't actually specify when the forecast was initialized. Check the datasets here https://weatherbench2.readthedocs.io/en/latest/data-guide.html to see what the standard format for forecast files it.

Regarding out of memory: Depending on how much RAM you have, evaluation can be hard to do in memory. In this case, you probably want to look at distributed evaluation: https://weatherbench2.readthedocs.io/en/latest/beam-in-the-cloud.html or choose a smaller time slice.