This HyTEST model evaluation course teaches students the foundations of scientific model evaluation, as well as how to apply those foundations to evaluate big datasets in the cloud using Pangeo, an open-source Python platform for distributed computing.
Using a combination of lectures and hands-on exercises in Jupyter Notebooks, we will cover data access and preparation, identifying and computing statistical evaluation metrics, and visualization of results. Furthermore, we consider how workflows can be modified for individual purposes and/or other use-cases.
On the technical side,
students will become familar with fundamental components of the Pangeo architecture, including S3, Dask
, intake
, xarray
, and hvplot
.
On the theory side,
students will learn about theory of model evaluation, how to compute and interpret different metrics for benchmarking model performance.
By the end, the student should be able to take the output of a geoscientific model, choose a meaningful suite of performance metrics, then use Pangeo to compute them and visualize the result.
May 10th and May 11th of 2023, 10am to 2pm MT
- Welcome/Icebreaker - 20 minutes
- Lesson 0: HyTEST, Pangeo Concepts, Tools - 20 minutes
- Lesson 1: Training Setup - 20 minutes
- Break - 15 minutes
- Lab 1: Data exploration - 30 minutes
- Break - 10 minutes
- Lab 2: Hvplot demonstration - 35 minutes
- Q & A: Day 1 - 20 minutes
- Lesson 3: Standard benchmarks (theory) - 30 minutes
- Lab 3: Standard benchmarks (application) - 30 minutes
- Break - 10 minutes
- Lesson 4: Selecting your objective - 30 minutes
- Break - 30 minutes
- Lab 4: Selecting your objective - 30 minutes
- Lesson 5: d-score benchmark (theory) - 30 minutes
- Break - 10 minutes
- Lab 5: d-score benchmark (application) - 20 minutes
- Wrap-up and Q&A - 20 minutes
Tim Hodson (USGS), Erin Towler (NCAR), Gene Trantham (USGS), & Sydney Foks (USGS)
Prior knowledge of Jupyter Notebooks and Python will be beneficial to participants, but not required. Please review some of the links provided below to gain better background knowledge on the Pangeo package suite and concepts.
- Pangeo Concepts: Pangeo Start Guide
- Parallelization: Dask
- Data organization: Intake
- Multidimensional Arrays: Xarray
- Data Visualization: Panel & Holoviz
-
- Panel example
- Intro to GitHub (youtube)
- Adding Pull Requests in GitHub (youtube)
- HyTEST repository, additional demos
- Pangeo Discourse
- Project Pythia, workflows recipes and examples
- MSTeam (internal to network): GS-Pangeo
- MSTeam (internal to network): GS-HPCUsers
- Page (internal to network): ARC HPC Training
- MSTeam (internal to network): GS-CHS-UserTraining
- Sharepoint (internal to network): HyTEST Project
- Sharepoint (internal to network): HyTEST Evaluation Task
Please feel free to reach out to Sydney Foks (sfoks at usgs.gov) with any questions.