GEUS-Glaciology-and-Climate/pypromice

Merging of the station records at each site including historical stations

Closed this issue · 1 comments

In a level_4 folder, having one merged record for each site, combining historical, v2 and v3 stations as well as moved stations (e.g. THU_U replaced by THU_U2). Ongoing implementation in https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/join_l4/src/pypromice/process/join_l4.py with some updates in other files (main...join_l4).

It uses is a list of the latest stations (as keys) and old stations in reverse chronological order:

old_name = {
'CEN2': ['CEN1', 'GITS'],
'CP1': ['CrawfordPoint1'],
'DY2': ['DYE-2'],
'JAR': ['JAR1'],
'HUM': ['Humboldt'],
'NAU': ['NASA-U'],
'NAE': ['NASA-E'],
'NEM': ['NEEM'],
'NSE': ['NASA-SE'],
'EGP': ['EastGRIP'],
'SDL': ['Saddle'],
'SDM': ['SouthDome'],
'SWC': ['SwissCamp', 'SwissCamp10m'],
'TUN': ['Tunu-N'],
'QAS_Uv3': ['QAS_U'],
'QAS_Mv3': ['QAS_M'],
'QAS_Lv3': ['QAS_L'],
'KAN_Lv3': ['KAN_L'],
'KPC_Uv3': ['KPC_U'],
'KPC_Lv3': ['KPC_L'],
'NUK_Uv3': ['NUK_U'],
'THU_U2': ['THU_U'],
}

At the moment join_l4 is called on the same list of stations as join_l3, meaning sites for which new transmission, new raw files or new flags have recently been added:
https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/b0d52ecf9427b204460f21f110ef0e049d0c49c4/l3_processor.sh#L173-L185

If a station is listed in old_name .values() (names in brackets in old_name ) then it is not processed by join_l4 (because appended to another AWS data). If a station is not in old_name.keys() then there's no historical data that needs to be appended and it is copied, as-is to the level_4 folder.

For the historical GC-Net stations, the aliases for variables are defined in an external file src/pypromice/process/variable_aliases_GC-Net.csv also defined as package data.

The merging is done by time slices:

ds1 = xr.concat((ds2.sel(
time=slice(ds2.time.isel(time=0),
ds1.time.isel(time=0))
), ds1), dim='time')

where ds1 is the current AWS data and ds2 is the historical AWS data being appended before the start of ds1.
Gap-filling during the overlapping period is currently not implemented.

The result are files of identical format and same variables as the level_3 files.

Instead of stid there is now a site_id and list_station_id attributes defined as:

site_id = n1.replace('v3','').replace('CEN2','CEN')
for l in [l3_h, l3_d, l3_m]:
l.attrs['site_id'] = site_id
l.attrs['station_id'] = site_id
if n1 in old_name.keys():
l.attrs['list_station_id'] = '('+n1+', '+', '.join(old_name[n1])+')'
else:
l.attrs['list_station_id'] = '('+n1+')'

meaning that we drop the the v3 and the 2 in CEN2 (and potentially other stations)

Right now, because of the parallel call to join_l4, join_l4 cannot know that it needs to re-append a given site (e.g. CEN) if the older station data (e.g. CEN1) is updated but not the latest station (e.g. CEN2).