Merging of the station records at each site including historical stations

In a level_4 folder, having one merged record for each site, combining historical, v2 and v3 stations as well as moved stations (e.g. THU_U replaced by THU_U2). Ongoing implementation in https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/join_l4/src/pypromice/process/join_l4.py with some updates in other files (main...join_l4).

It uses is a list of the latest stations (as keys) and old stations in reverse chronological order:

pypromice/src/pypromice/process/join_l4.py

Lines 12 to 35 in 97eaedb

    
           old_name = { 
        
                           'CEN2': ['CEN1', 'GITS'], 
        
                           'CP1': ['CrawfordPoint1'], 
        
                           'DY2': ['DYE-2'], 
        
                           'JAR': ['JAR1'], 
        
                           'HUM': ['Humboldt'], 
        
                           'NAU': ['NASA-U'], 
        
                           'NAE': ['NASA-E'], 
        
                           'NEM': ['NEEM'], 
        
                           'NSE': ['NASA-SE'], 
        
                           'EGP': ['EastGRIP'], 
        
                           'SDL': ['Saddle'], 
        
                           'SDM': ['SouthDome'], 
        
                           'SWC': ['SwissCamp', 'SwissCamp10m'], 
        
                           'TUN': ['Tunu-N'], 
        
                           'QAS_Uv3': ['QAS_U'], 
        
                           'QAS_Mv3': ['QAS_M'], 
        
                           'QAS_Lv3': ['QAS_L'], 
        
                           'KAN_Lv3': ['KAN_L'], 
        
                           'KPC_Uv3': ['KPC_U'], 
        
                           'KPC_Lv3': ['KPC_L'], 
        
                           'NUK_Uv3': ['NUK_U'], 
        
                           'THU_U2': ['THU_U'], 
        
                           }

At the moment join_l4 is called on the same list of stations as join_l3, meaning sites for which new transmission, new raw files or new flags have recently been added:
https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/b0d52ecf9427b204460f21f110ef0e049d0c49c4/l3_processor.sh#L173-L185

If a station is listed in old_name .values() (names in brackets in old_name ) then it is not processed by join_l4 (because appended to another AWS data). If a station is not in old_name.keys() then there's no historical data that needs to be appended and it is copied, as-is to the level_4 folder.

For the historical GC-Net stations, the aliases for variables are defined in an external file src/pypromice/process/variable_aliases_GC-Net.csv also defined as package data.

The merging is done by time slices:

pypromice/src/pypromice/process/join_l4.py

Lines 229 to 232 in 97eaedb

    
           ds1 = xr.concat((ds2.sel( 
        
                       time=slice(ds2.time.isel(time=0), 
        
                                  ds1.time.isel(time=0)) 
        
                       ), ds1), dim='time')

where ds1 is the current AWS data and ds2 is the historical AWS data being appended before the start of ds1.
Gap-filling during the overlapping period is currently not implemented.

The result are files of identical format and same variables as the level_3 files.

Instead of stid there is now a site_id and list_station_id attributes defined as:

pypromice/src/pypromice/process/join_l4.py

Lines 271 to 278 in 97eaedb

    
           site_id = n1.replace('v3','').replace('CEN2','CEN') 
        
           for l in [l3_h, l3_d, l3_m]: 
        
               l.attrs['site_id'] = site_id 
        
               l.attrs['station_id'] = site_id 
        
               if n1 in old_name.keys(): 
        
                   l.attrs['list_station_id'] = '('+n1+', '+', '.join(old_name[n1])+')' 
        
               else: 
        
                   l.attrs['list_station_id'] = '('+n1+')'

meaning that we drop the the v3 and the 2 in CEN2 (and potentially other stations)

Right now, because of the parallel call to join_l4, join_l4 cannot know that it needs to re-append a given site (e.g. CEN) if the older station data (e.g. CEN1) is updated but not the latest station (e.g. CEN2).

fixed in #294

	old_name = {
	'CEN2': ['CEN1', 'GITS'],
	'CP1': ['CrawfordPoint1'],
	'DY2': ['DYE-2'],
	'JAR': ['JAR1'],
	'HUM': ['Humboldt'],
	'NAU': ['NASA-U'],
	'NAE': ['NASA-E'],
	'NEM': ['NEEM'],
	'NSE': ['NASA-SE'],
	'EGP': ['EastGRIP'],
	'SDL': ['Saddle'],
	'SDM': ['SouthDome'],
	'SWC': ['SwissCamp', 'SwissCamp10m'],
	'TUN': ['Tunu-N'],
	'QAS_Uv3': ['QAS_U'],
	'QAS_Mv3': ['QAS_M'],
	'QAS_Lv3': ['QAS_L'],
	'KAN_Lv3': ['KAN_L'],
	'KPC_Uv3': ['KPC_U'],
	'KPC_Lv3': ['KPC_L'],
	'NUK_Uv3': ['NUK_U'],
	'THU_U2': ['THU_U'],
	}

	ds1 = xr.concat((ds2.sel(
	time=slice(ds2.time.isel(time=0),
	ds1.time.isel(time=0))
	), ds1), dim='time')

	site_id = n1.replace('v3','').replace('CEN2','CEN')
	for l in [l3_h, l3_d, l3_m]:
	l.attrs['site_id'] = site_id
	l.attrs['station_id'] = site_id
	if n1 in old_name.keys():
	l.attrs['list_station_id'] = '('+n1+', '+', '.join(old_name[n1])+')'
	else:
	l.attrs['list_station_id'] = '('+n1+')'