Notes and next steps from meeting on 10/26/2021 with Khachik
Opened this issue · 2 comments
saeed-moghimi-noaa commented
Saeed notes from meeting on 10/26/2021 with Khachik
@WPringle and @zacharyburnettNOAA see if this note is useful.
Notes for William
- using a polynomial for the surrogate might not be the best method for maximum elevations
- because it might be “jumpy” (discontinuous) - perhaps look into time series for PC?
- 3rd order polynomial might be overfitting because of too many degrees of freedom
surrogate - if we can’t use polynomials, we will have to move to a more flexible scheme
- neural network (don’t get sensitivities for free)
- plot map of sensitivities using function provided in document
- To figure out to do percentile !?
- in high-dimensional (more than 4) random variables, "there is no such thing as quantile"
Notes for Zach
- quadrature can be more accurate with less samples, but could fail for discontinued parameters
- take out the original run from the quadrature fit - otherwise the quadrature will break
- repeat the same sample with regression
- try KL or PCA
Notes for Both
- try validating the surrogate with 50 or so reference samples separate from the training set
- plot the fit against the training data
- if the training set fits well but validation set does NOT fit well, then the surrogate model is overfit
- use Equation 6 from doc to go from math space to physical space - "everything should go through Equation 6"
- try testing a single mesh node
- pick a single node
- gather maximum elevations
- build surrogate
- compare percentiles
Things we can Try
- perhaps we can ravel time into the node space
- compress time axis with KL eigenvalues
- either way we will have to sparsify nodes
- perhaps we can decompose the matrix
- what we need is the covariance of the entire matrix
- aggregate with dask?
- sparsify time series
- get 12 hour leadup time to storm landfall
- every 2 hours
saeed-moghimi-noaa commented
Fold time, lat and lon and pass it to KL or PCA to compress it. Perhaps instead of 5 or 6 mode we may get 10s of modes.
- go every m spatial point and every n time point
- use MPI or some sort of parallel implementation