hytest-org/training-2023-model-evaluation

Notes - how to change to class format

sfoks opened this issue · 0 comments

sfoks commented

concepts prior to notebooks that should be introduced:

-rechunk, compute near data concept, pangeo concepts, holoviews/holoviz?, other viz tools (hydroinspector), check with Rich on slides and content.


01_data_prep.ipynb (de-emphasize this one to not be confusing).

  1. @gzt5142 requesterpays issues maybe reading from nhgf bucket? @sfoks was having problems with this.
  2. @sfoks : not sure why there are roughly 350 gage IDs in the modeled dataset with letters embedded in the string of digits after the 'USGS-'... (These will be rejected by the API when we try to call NWIS data service to obtain streamflow history for that location).
  3. @sfoks to determine if students can we write to pangeo.chs.usgs.gov? or should students have access to open storage network to write to? @sfoks to ask @amsnyder @rsignell-usgs
  4. assign HPC inserts, explain briefly "how things would change if on HPC" (can be in comments).
  5. @sfoks adding nhm-prms streamflow to intake and access on cloud, double check with @amsnyder @rsignell-usgs

Lab options:

  • Change first 100 gages to 200 gages
  • Only look at gages west of 100th meridian
  • Subset the gages for their state or favorite huc -- or deeper homework (?)
  • Make sure to change number of gages back to 100 for fast export of file.
  • In step 6, examine obs data, plot over CONUS, what are the number of gages, xarray dataset show-off.

Add exploratory notebook

@sfoks @towlere

  • option: use the same dataset that you pull in data_prep 1
  • option: use the pre-organized data from intake, talk about intake, how to change intake datasets.
  • data structure, chunk size, time series data vs spatial analysis (rich has a good slide on this, take some language from this https://github.com/hytest-org/hytest/blob/main/dataset_preprocessing/ReChunkingData.ipynb , rich's slide from latest rechunk overview, that 2013 presentation).
  • map; point & explore.
  • explore hydrographs, compare obs vs mod raw data 1:1 plots, stations, plot station locations,
  • xarray show-off
  • baby overview of dask (just how to read data, point to dask tutorial if people want more)
  • resample by time show-off?

Lab options:

  • create map with hvplot of gage locations, hover to a gage, then plot hydrograph for that gage?
  • change gage location, change time period of analysis in hydrograph?
  • change modeling application?

analysis_stdsuite.ipynb

@thodson-usgs

  1. note during lecture: reacquaint everyone to what the data are and where they live.
  2. can we use %% R magic in Jupyter?
    # let's activate R magig
    %load_ext rpy2.ipython
  3. note during lecture: make a point in teaching of closing down clusters, only starting when needed.

Lab options:

  • add two functions: AvgKGE, MedKGE functions - could be a canned example/homework assignment at end of notebook; hidden cells.
  • add a statistic to the standard suite, like some GOF function, rerun on all sites.

viz notebook

@gzt5142

  1. note during lecture: to emphasize connection between csv generated in the analysis notebooks we just exported to the input of the viz notebook.
  2. notebook refilters to cobalt gages (keep in case folks need this again, also possible option to further filter gages).
  3. potentially add scorecard from usage example to viz notebook?

Canned example/demo:

  • (optional) select all gages with percent bias greater than 50% and less than -50% (visualize on map)
  • (optional) select west of 100th meridian and visualize performance?

Lab options:

  • change the colorbar in map? (low-priority), maybe just add link to holoviz or panels customizations?
  • [potential] deeper homework/lab we could add:: attach gagesII attributes to sites from sciencebase, group gages by a new grouping (join gage to new dataset, do boxplots or something easy).

analysis_dscore.ipynb

@thodson-usgs

  1. potentially add dscore scorecard (from usage notebook Gene put together).