/DAPPER

Data Assimilation Package in Python for Experimental Research (DAPPER)

Primary LanguagePythonMIT LicenseMIT

DAPPER is a set of templates for benchmarking the performance of data assimilation (DA) methods. The tests provide experimental support and guidance for new developments in DA. Example diagnostics:

EnKF - Lorenz'63

The typical set-up is a "twin experiment", where you

  • specify a
    • dynamic model*
    • observational model*
  • use these to generate a synthetic
    • "truth"
    • and observations thereof*
  • assess how different DA methods perform in estimating the truth, given the above starred (*) items.

DAPPER enables the numerical investigation of DA methods through its variety of typical test cases and statistics. It reproduces numerical results (benchmarks) reported in the literature, and facilitates comparative studies, thus promoting the reliability and relevance of the results. DAPPER is open source, written in Python, and focuses on readability; this promotes the reproduction and dissemination of the underlying science, and makes it easy to adapt and extend. In summary, it is well suited for teaching and fundamental DA research.

In a trade-off with the above advantages, DAPPER makes some sacrifices of efficiency and flexibility (generality). I.e. it is not designed for the assimilation of real data in operational models.

A good place to start is with the scripts example_1/2.py. Alternatively, see the tutorials folder for an intro to DA.

Installation

Prerequisite: python3.5+ with scipy, matplotlib, pandas. This is all comes with anaconda by default.

Download, extract the DAPPER folder, and cd into it. To test it, run:

python -i example_1.py

For the tutorials, you will also need jupyter and the markdown package.

It is also recommended to install tqdm (e.g. pip install tqdm).

Methods

References provided at bottom

Method name Literature RMSE results reproduced
EnKF 1 Sakov and Oke (2008)
EnKF-N Bocquet (2012), (2015)
EnKS, EnRTS Raanes (2016a)
iEnKS (and -N) Sakov (2012), Bocquet (2012), (2014)
LETKF, local & serial EAKF Bocquet (2011)
Sqrt. model noise methods Raanes (2015)
Particle filter (bootstrap) 2 Bocquet (2010)
Optimal/implicit Particle filter 2 "
NETF Tödter (2015), Wiljes (2017)
Rank histogram filter (RHF) Anderson (2010)
Extended KF Raanes (2016b)
Optimal interpolation "
Climatology "
3D-Var

1: Stochastic, DEnKF (i.e. half-update), ETKF (i.e. sym. sqrt.).
Tuned with inflation and "random, orthogonal rotations".
2: Resampling: multinomial (including systematic/universal and residual).
The particle filter is tuned with "effective-N monitoring", "regularization/jittering" strength, and more.

Models

Model Linear? Phys.dim. State len # Lyap≥0 Thanks to
Lin. Advect. Yes 1D 1000 * 51 Evensen/Raanes
Lorenz63 No 0D 3 2 Lorenz/Sakov
Lorenz84 No 0D 3 2 Lorenz/Raanes
Lorenz95 No 1D 40 * 13 Lorenz/Raanes
LorenzUV No 2x 1D 256 + 8 * ≈60 Lorenz/Raanes
MAOOAM No 2x 1D 36 ? Vannitsem/Tondeur
Quasi-Geost No 2D 129²≈17k ? Sakov

*: flexible; set as necessary

Additional features

  • Progressbar
  • Many visualizations, including
    • liveplotting (during assimilation)
    • intelligent defaults (axis limits, ...)
  • Many diagnostics and statistics
    • Confidence interval on times series (e.g. rmse) averages with
      • automatic correction for autocorrelation
      • significant digits printing
  • Tools to manage and display experimental settings and stats
  • Parallelisation options
    • (Independent) experiments can run in parallel; see example_3.py
    • Forecast parallelisation is possible since the (user-implemented) model has access to the full ensemble (see mods/QG/core.py)
    • A light-weight alternative (see e.g. mods/Lorenz95/core.py): native numpy vectorization (again by having access to full ensemble).
  • Gentle failure system to allow execution to continue if experiment fails.
  • Classes that simplify treating:
    • Time sequences Chronology/Ticker with consistency checks
    • random variables (RandVar): Gaussian, Student-t, Laplace, Uniform, ..., as well as support for custom sampling functions.
    • covariance matrices (CovMat): provides input flexibility/overloading, lazy eval) that facilitates the use of non-diagnoal covariance matrices (whether sparse or full).

What it can't do

  • Highly efficient DA on very big models (see discussion in introduction).
  • Time-dependent error covariances and changes in lengths of state/obs (but models f and h may otherwise be time-dependent).
  • Non-uniform time sequences not fully supported.

How to

DAPPER is like a set of templates (not a framework); do not hesitate make your own scripts and functions (instead of squeezing everything into standardized configuration files).

Add a new method

Just add it to da_methods.py, using the others in there as templates.

Add a new model

  • Make a new dir: DAPPER/mods/your_mod
  • Add the empty file __init__.py
  • See other examples, e.g. DAPPER/mods/Lorenz63/sak12.py
  • Make sure that the model (and obs operator) supports 2D-array (i.e. ensemble) and 1D-array (single realization) input. See Lorenz63 and Lorenz95 for typical implementation.

Alternative projects

Sorted by approximate project size. DAPPER may be situated somewhere in the middle.

Name Developers Purpose (vs. DAPPER)
DART NCAR Operational and real-world DA
ERT* Statoil Operational (petroleum) history matching
OpenDA TU Delft Operational and real-world DA
EMPIRE Reading (Met) Operational and real-world DA
SANGOMA Conglomerate** Unified code repository researchers
Verdandi INRIA Real-world biophysical DA
PDAF Nerger Real-world and example DA
PyOSSE Edinburgh, Reading Real-world earth-observation DA
MIKE DHI Real-world oceanographic DA. Commercial?
OAK Liège Real-world oceaonagraphic DA
Siroco OMP Real-world oceaonagraphic DA
FilterPy R. Labbe Engineering, general intro to Kalman filter
DASoftware Yue Li, Stanford Matlab, large-scale
Pomp U of Michigan R, general state-estimation
PyIT CIPR Real-world petroleum DA (?)
Datum* Raanes Matlab, personal publications
EnKF-Matlab* Sakov Matlab, personal publications and intro
EnKF-C Sakov C, light-weight EnKF, off-line
IEnKS code* Bocquet Python, personal publications
pyda Hickman Python, personal publications

*: Has been inspirational in the development of DAPPER.

**: Liege/CNRS/NERSC/Reading/Delft

TODO

  • Reorg file structure
  • Turn into package?
  • Simplify time management?
  • Use pandas for stats time series?
  • Complete QG

References

  • Sakov (2008) : Sakov and Oke. "A deterministic formulation of the ensemble Kalman filter: an alternative to ensemble square root filters".
  • Anderson (2010): "A Non-Gaussian Ensemble Filter Update for Data Assimilation"
  • Bocquet (2010) : Bocquet, Pires, and Wu. "Beyond Gaussian statistical modeling in geophysical data assimilation".
  • Bocquet (2011) : Bocquet. "Ensemble Kalman filtering without the intrinsic need for inflation,".
  • Sakov (2012) : Sakov, Oliver, and Bertino. "An iterative EnKF for strongly nonlinear systems".
  • Bocquet (2012) : Bocquet and Sakov. "Combining inflation-free and iterative ensemble Kalman filters for strongly nonlinear systems".
  • Bocquet (2014) : Bocquet and Sakov. "An iterative ensemble Kalman smoother".
  • Bocquet (2015) : Bocquet, Raanes, and Hannart. "Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation".
  • Tödter (2015) : Tödter and Ahrens. "A second-order exact ensemble square root filter for nonlinear data assimilation".
  • Raanes (2015) : Raanes, Carrassi, and Bertino. "Extending the square root method to account for model noise in the ensemble Kalman filter".
  • Raanes (2016a) : Raanes. "On the ensemble Rauch-Tung-Striebel smoother and its equivalence to the ensemble Kalman smoother".
  • Raanes (2016b) : Raanes. "Improvements to Ensemble Methods for Data Assimilation in the Geosciences".
  • Wiljes (2017) : Aceved, Wilje and Reich. "Second-order accurate ensemble transform particle filters".

Further references are provided in the algorithm codes.

Contact

patrick. n. raanes AT gmail

Licence

License: MIT