ec-jrc/Thalassa

Station and time series plots

saeed-moghimi-noaa opened this issue · 3 comments

Station and time series plots

Just in order to get this discussion started, the following is a possible format of the stations file.

The idea is that we will have a NetCDF file that will store both the data and the metadata of each station as data variables. E.g.:

import numpy as np
import pandas as pd
import xarray as xr

from numpy.random import default_rng

rng = default_rng(1234)

np.set_printoptions(linewidth=200)

# User input
no_nodes = 15  # The lons, lats, station_names and station_ids are actually hardcoded to 15, you need to update them if you want to change no_nodes
no_periods = 72

nodes = list(range(no_nodes))
timestamps = pd.date_range("2001-01-01", freq="h", periods=no_periods)
lons = np.array([ 0, -0.21650635, -0.4330127 , -0.64951905, -0.8660254 , 0.21650635,  0, -0.21650635, -0.4330127 , 0.4330127 , 0.21650635, 0, 0.64951905, 0.4330127 , 0.8660254 ])
lats = np.array([ 1, 0.625, 0.25 , -0.125, -0.5 , 0.625, 0.25, -0.125, -0.5, 0.25 , -0.125, -0.5, -0.125, -0.5, -0.5])

elevation = rng.random((no_nodes, no_periods))

# Additional station metadata
station_names = ['Jonathan', 'Rick', 'Bryan', 'Gregory', 'Michael', 'Rebecca', 'Bobby', 'Jacob', 'Brian', 'Kelly', 'Carrie', 'Richard', 'Sherri', 'Ryan', 'Sabrina']
station_ids = [13, 84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43, 76, 0, 44]

ds = xr.Dataset(
    coords=dict(
        node=nodes,
        time=timestamps,
    ),
    data_vars=dict(
        lon=("node", lons),
        lat=("node", lats), 
        elevation=(("node", "time"), elevation),
        # Additional node metadata can be added as data variables
        station_name=("node", station_names),
        station_id=("node", station_ids),
    ),
)
ds

which results in something like this:

<xarray.Dataset>
Dimensions:       (node: 15, time: 72)
Coordinates:
  * node          (node) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  * time          (time) datetime64[ns] 2001-01-01 ... 2001-01-03T23:00:00
Data variables:
    lon           (node) float64 0.0 -0.2165 -0.433 ... 0.6495 0.433 0.866
    lat           (node) float64 1.0 0.625 0.25 -0.125 ... -0.5 -0.125 -0.5 -0.5
    elevation     (node, time) float64 0.9112 0.691 0.168 ... 0.3844 0.06538
    station_name  (node) <U8 'Jonathan' 'Rick' 'Bryan' ... 'Ryan' 'Sabrina'
    station_id    (node) int64 13 84 76 25 49 44 65 78 9 2 83 43 76 0 44

Thalassa should be able to handle any NetCDF that has this structure/coordinates and which contains at least the lon, lat, elevation and station_name variables. Additional variables should be allowed, but I am not sure if there is a meaningful way to handle them

pmav99 commented

Current master supports an initial implementation of this. I am keeping the ticket open because we need to document it.

@pmav99 @brey
Please add in ReadMe or point to an example in Thalassa on how to use https://github.com/oceanmodeling/searvey to prepare netcdf file input requires for Thalassa to perform model/data comparison. Thanks a lot.