google-deepmind/graphcast

Performance of GraphCast_small vs regular GraphCast vs GraphCast_operational

yeechianlow opened this issue · 1 comments

Hi,

Is there data comparing the performance of GraphCast_small vs that of the regular, high-resolution version of GraphCast? I would assume that the GraphCast_small wouldn't perform quite as well, but I am curious as to by how much.

Also, for the GraphCast_operational model, what does it mean for the model to be fine-tuned on HRES data from 2016 to 2021 after already being trained on earlier ERA5 data?

Thanks in advance for any insight!

Is there data comparing the performance of GraphCast_small vs that of the regular, high-resolution version of GraphCast? I would assume that the GraphCast_small wouldn't perform quite as well, but I am curious as to by how much.

I don't really recall the exact numbers but resolution does matter, especially in short lead times and especially for variables with high frequency components. That said, if I remember correctly, we are talking about errors that are worse by between 0% and 20% depending on the variable and lead time (but please don't just take my word for it, I would recommend you to run your own comparisons on the provided test data). Of course these differences may be sufficiently large such that the small model may not be better than the operational HRES model in some variables and lead-times.

Also, for the GraphCast_operational model, what does it mean for the model to be fine-tuned on HRES data from 2016 to 2021 after already being trained on earlier ERA5 data?

The ERA5 re-analysis data, does not actually look identical to the operational analysis data that ECMWF publishes in realtime, and that can given as inputs. This is because (1) the NWPs used to do the assimilation are different, (2) the resolution at which the assimilation is done is also different, (3) the assimilation windows are also different, (4) I think sometimes it is also not exactly the same set of observations that contribute to the assimilation. So for a model to work best operationally, it needs to have been trained a bit on operational data.

Now, operational data at this resolution is only available from 2016, but ERA5 is available for tens of years, so it is still beneficial to use ERA5 for the sake of additional availability of data.

So essentially the way we approach this was to pre-train on ~40 years of ERA5, and then fine-tune the models on ~6 years of operational data.

Hope this makes sense!