noaa-ocs-modeling/EnsemblePerturbation

Notes and next steps from meeting on 10/26/2021 with Khachik

Opened this issue · 2 comments

Saeed notes from meeting on 10/26/2021 with Khachik

@WPringle and @zacharyburnettNOAA see if this note is useful.

Notes for William

  • using a polynomial for the surrogate might not be the best method for maximum elevations
    • because it might be “jumpy” (discontinuous) - perhaps look into time series for PC?
    • 3rd order polynomial might be overfitting because of too many degrees of freedom
      surrogate
    • if we can’t use polynomials, we will have to move to a more flexible scheme
      • neural network (don’t get sensitivities for free)
  • plot map of sensitivities using function provided in document
  • To figure out to do percentile !?
    • in high-dimensional (more than 4) random variables, "there is no such thing as quantile"

Notes for Zach

  • quadrature can be more accurate with less samples, but could fail for discontinued parameters
  • take out the original run from the quadrature fit - otherwise the quadrature will break
  • repeat the same sample with regression
  • try KL or PCA

Notes for Both

  • try validating the surrogate with 50 or so reference samples separate from the training set
    • plot the fit against the training data
    • if the training set fits well but validation set does NOT fit well, then the surrogate model is overfit
  • use Equation 6 from doc to go from math space to physical space - "everything should go through Equation 6"
  • try testing a single mesh node
    • pick a single node
    • gather maximum elevations
    • build surrogate
    • compare percentiles

Things we can Try

  • perhaps we can ravel time into the node space
    • compress time axis with KL eigenvalues
    • either way we will have to sparsify nodes
  • perhaps we can decompose the matrix
    • what we need is the covariance of the entire matrix
    • aggregate with dask?
  • sparsify time series
    • get 12 hour leadup time to storm landfall
    • every 2 hours

Fold time, lat and lon and pass it to KL or PCA to compress it. Perhaps instead of 5 or 6 mode we may get 10s of modes.

  • go every m spatial point and every n time point
  • use MPI or some sort of parallel implementation