Elefridge.jl - Compressing atmospheric data into its real information
This repository contains analysis and plotting scripts for
M Klöwer, M Razinger, JJ Dominguez, PD Düben, TN Palmer, 2021. Compressing atmospheric data into its real information content., Nature Computational Science, accepted. Preprint 10.21203/rs.3.rs-590601/v1
Analysis notebooks can be found in /nb. This repository also summarises the results on ECMWF's summer of weather code challege #14: Size, precision, speed - pick two in summary.md. The original prosal is in proposal.md.
As part of this project, the following Julia packages have been developed
Abstract
Hundreds of petabytes of data are produced annually at weather and climate forecast centres worldwide. Compression is inevitable to reduce storage and to facilitate data sharing. Current techniques do not distinguish the real from the false information in data. We define the bitwise real information content from information theory for data from the Copernicus Atmospheric Monitoring Service (CAMS). Most variables contain less than 7 bits of real information per value, which are also highly compressible due to spatio-temporal correlation. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. The entire CAMS data is compressed by a factor of 17x, relative to 64-bit floats, while preserving 99% of real information. Combined with 4-dimensional compression to exploit the spatio-temporal correlation, factors beyond 60x are achieved without an increase in forecast errors. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data.