DAPPER: A Python repository from Rainbow1994

DAPPER is a set of templates for benchmarking the performance of data assimilation (DA) methods. The tests provide experimental support and guidance for new developments in DA.

The typical set-up is a synthetic (twin) experiment, where you

specify a
- dynamic model*
- observational model*
use these to generate a synthetic
- "truth"
- and observations thereof*
assess how different DA methods perform in estimating the truth, given the above starred (*) items.

Highlights

DAPPER enables the numerical investigation of DA methods through a variety of typical test cases and statistics. It (a) reproduces numerical benchmarks results reported in the literature, and (b) facilitates comparative studies, thus promoting the (a) reliability and (b) relevance of the results. For example, this figure is generated by example_3.py and is a reproduction from this book on DA.

DAPPER is (c) open source, written in Python, and (d) focuses on readability; this promotes the (c) reproduction and (d) dissemination of the underlying science, and makes it easy to adapt and extend. It also comes with a battery of diagnostics and statistics, and live plotting (on-line with the assimilation) facilities, including pause/inspect options, as illustrated below

In summary, it is well suited for teaching and fundamental DA research. Also see its drawbacks.

Installation

Works on Linux/Windows/Mac.

Prerequisite: Python>=3.7

If you're not an admin or expert:

Install Anaconda.
Open the Anaconda terminal and run the following commands:
```
conda create --yes --name my-DA-env python=3.8
conda activate my-DA-env
python -c 'import sys; print("Version:", sys.version.split()[0])'
```
Ensure the output at the end gives a version bigger than 3.7.
Keep using the same terminal for the commands below.

Install

Either: Install as library

Do you simply want to run a script that requires DAPPER? Then

If the script come with a requirements.txt file, then do
pip install -r path/to/requirements.txt.
If not, hopefully you know the version of DAPPER needed. Run
pip install DA-DAPPER==1.0.0 to get version 1.0.0.

Or: Install for development

Do you want the DAPPER code readily available to look into? Then

Download and unzip (or git clone) DAPPER.
Move the resulting folder wherever you like,
and cd into it (ensure you're in the folder with a setup.py file).
pip install -e . (don't forget the .).
Alternatively, if you want to develop the code, install with pip install -e .[Dev]

Finally: Test the installation

You should now be able to do run your script with python path/to/script.py.
For example, if you are in the DAPPER dir,

python example_1.py

If you've closed the terminal (or shut down your computer), you first need to open the (anaconda) terminal and run this:

conda activate my-DA-env

Quickstart

Read, run, and understand the scripts example_{1,2,3}.py. Then, get familiar with the code.

The documentation provide the API reference, but is not very mature.

Alternatively, DA-tutorials provides a python-based introduction to DA.

DA methods

Method	Literature reproduced
EnKF ¹	Sakov08, Hoteit15
EnKF-N	Bocquet12, Bocquet15
EnKS, EnRTS	Raanes2016
iEnKS / iEnKF / EnRML / ES-MDA ²	Sakov12, Bocquet12, Bocquet14
LETKF, local & serial EAKF	Bocquet11
Sqrt. model noise methods	Raanes2014
Particle filter (bootstrap) ³	Bocquet10
Optimal/implicit Particle filter ³	Bocquet10
NETF	Tödter15, Wiljes16
Rank histogram filter (RHF)	Anderson10
4D-Var
3D-Var
Extended KF
Optimal interpolation
Climatology

¹: Stochastic, DEnKF (i.e. half-update), ETKF (i.e. sym. sqrt.). Serial forms are also available.
Tuned with inflation and "random, orthogonal rotations".
²: Also supports the bundle version, and "EnKF-N"-type inflation.
³: Resampling: multinomial (including systematic/universal and residual).
The particle filter is tuned with "effective-N monitoring", "regularization/jittering" strength, and more.

For a list of ready-made experiments with suitable, tuned settings for a given method (e.g. the iEnKS), use gnu's grep:

$ cd dapper/mods
$ grep -r "iEnKS.*("

Test cases (models)

Model	Lin?	TLM+?	PDE?	Phys.dim.	State len	Lyap≥0	Implementer
Linear Advect. (LA)	Yes	Yes	Yes	1d	1000 *	51	Evensen/Raanes
DoublePendulum	No	Yes	No	0d	4	2	Matplotlib/Raanes
Ikeda	No	Yes	No	0d	2	1	Raanes
LotkaVolterra	No	Yes	No	0d	5 *	1	Wikipedia/Raanes
Lorenz63	No	Yes	"Yes"	0d	3	2	Sakov
Lorenz84	No	Yes	No	0d	3	2	Raanes
Lorenz96	No	Yes	No	1d	40 *	13	Raanes
LorenzUV	No	Yes	No	2x 1d	256 + 8 *	≈60	Raanes
Kuramoto-Sivashinsky	No	Yes	Yes	1d	128 *	11	Kassam/Raanes
Quasi-Geost (QG)	No	No	Yes	2d	129²≈17k	≈140	Sakov

*: Flexible; set as necessary
+: Tangent Linear Model included?

The models are found as subdirectories within dapper/mods. A model should be defined in a file named __init__.py, and illustrated by a file named demo.py. Ideally, both of these files do not rely on the rest of DAPPER. More info.

Most of the other files within a model subdirectory are usually named authorYEAR.py and define a HMM object, which holds the settings of a specific twin experiment, using that model, as detailed in the corresponding author/year's paper. At the bottom of each such file should be (in comments) a list of suitable, tuned settings for various DA methods, along with their expected, average rmse.a score for that experiment. The complete list of included experiment files can be obtained with gnu's find:

$ cd dapper/mods
$ find . -iname "[a-z]*20[0-9].py"

Some of these files contain settings that have been used in several papers. As mentioned above, DAPPER reproduces literature results. There are also results in the literature that DAPPER does not reproduce. Typically, this means that the published results are incorrect.

Similar projects

DAPPER is aimed at research and teaching (see discussion up top). Example of limitations:

It is not suited for very big models (>60k unknowns).
Time-dependent error covariances and changes in lengths of state/obs (although the Dyn and Obs models may otherwise be time-dependent).
Non-uniform time sequences not fully supported.

Also, DAPPER comes with no guarantees/support. Therefore, if you have an operational (real-world) application, such as WRF, you should look into one of the alternatives, sorted by approximate project size.

Name	Developers	Purpose (approximately)
DART	NCAR	Operational, general
PDAF	AWI	Operational, general
JEDI	JCSDA (NOAA, NASA, ++)	Operational, general (in develpmt?)
ERT	Statoil	Operational, history matching (Petroleum)
OpenDA	TU Delft	Operational, general
Verdandi	INRIA	Biophysical DA
PyOSSE	Edinburgh, Reading	Earth-observation DA
SANGOMA	Conglomerate*	Unify DA research
EMPIRE	Reading (Met)	Research (high-dim)
MIKE	DHI	Oceanographic. Commercial?
OAK	Liège	Oceaonagraphic
Siroco	OMP	Oceaonagraphic
FilterPy	R. Labbe	Engineering, general intro to Kalman filter
DASoftware	Yue Li, Stanford	Matlab, large-scale
Pomp	U of Michigan	R, general state-estimation
PyIT	CIPR	Real-world petroleum DA (?)
Datum	Raanes	Matlab, personal publications
EnKF-Matlab	Sakov	Matlab, personal publications and intro
EnKF-C	Sakov	C, light-weight EnKF, off-line
IEnKS code	Bocquet	Python, personal publications
pyda	Hickman	Python, personal publications

The EnKF-Matlab and IEnKS codes have been inspirational in the development of DAPPER.

*: AWI/Liege/CNRS/NERSC/Reading/Delft

Contributors

Patrick N. Raanes, Colin Grudzien, Maxime Tondeur, Remy Dubois

If you use this software in a publication, please cite as follows.

@misc{raanes2018dapper,
  author = {Patrick N. Raanes and others},
  title  = {nansencenter/DAPPER: Version 0.8},
  month  = December,
  year   = 2018,
  doi    = {10.5281/zenodo.2029296},
  url    = {https://doi.org/10.5281/zenodo.2029296}
}