/CDFpp

A modern C++ header only cdf library with Python bindings

Primary LanguageJupyter NotebookMIT LicenseMIT

License: GPL v3 Documentation Status CPP17 PyPi Coverage Discover on MyBinder

Python packages

Linux x86_64 Windows x86_64 MacOs x86_64 MacOs ARM64
linux_x86_64 windows_x86_64 macos_x86_64 macos_arm64

Unit Tests

Linux x86_64 Windows x86_64 MacOs x86_64
linux_x86_64 windows_x86_64 macos_x86_64

CDFpp (CDF++)

A NASA's CDF modern C++ library. This is not a C++ wrapper but a full C++ implementation. Why? CDF files are still used for space physics missions but few implementations are available. The main one is NASA's C implementation available here but it lacks multi-threads support (global shared state), has an old C interface and has a license which isn't compatible with most Linux distributions policy. There are also Java and Python implementations which are not usable in C++.

List of features and roadmap:

  • CDF reading
    • read files from cdf version 2.2 to 3.x
    • read uncompressed file headers
    • read uncompressed attributes
    • read uncompressed variables
    • read variable attributes
    • loads cdf files from memory (std::vector or char*)
    • handles both row and column major files
    • read variables with nested VXRs
    • read compressed files (GZip, RLE)
    • read compressed variables (GZip, RLE)
    • read UTF-8 encoded files
    • read ISO 8859-1(Latin-1) encoded files (converts to UTF-8 on the fly)
    • variables values lazy loading
    • decode DEC's floating point encoding (Itanium, ALPHA and VAX)
    • pad values
  • CDF writing
    • write uncompressed headers
    • write uncompressed attributes
    • write uncompressed variables
    • write compressed variables
    • write compressed files
    • pad values
  • General features
    • uses libdeflate for faster GZip decompression
    • highly optimized CDF reads (up to ~4GB/s read speed from disk)
    • handle leap seconds
    • Python wrappers
    • Documentation
    • Examples (see below)
    • Benchmarks

If you want to understand how it works, how to use the code or what works, you may have to read tests.

Installing

From PyPi

python3 -m pip install --user pycdfpp

From sources

meson build
cd build
ninja
sudo ninja install

Or if youl want to build a Python wheel:

python -m build . 
# resulting wheel will be located into dist folder

Basic usage

Python

Reading CDF files

Basic example from a local file:

import pycdfpp
cdf = pycdfpp.load("some_cdf.cdf")
cdf_var_data = cdf["var_name"].values #builds a numpy view or a list of strings
attribute_name_first_value = cdf.attributes['attribute_name'][0]

Note that you can also load in memory files:

import pycdfpp
import requests
import matplotlib.pyplot as plt
tha_l2_fgm = pycdfpp.load(requests.get("https://spdf.gsfc.nasa.gov/pub/data/themis/tha/l2/fgm/2016/tha_l2_fgm_20160101_v01.cdf").content)
plt.plot(tha_l2_fgm["tha_fgl_gsm"])
plt.show()

Buffer protocol support:

import pycdfpp
import requests
import xarray as xr
import matplotlib.pyplot as plt

tha_l2_fgm = pycdfpp.load(requests.get("https://spdf.gsfc.nasa.gov/pub/data/themis/tha/l2/fgm/2016/tha_l2_fgm_20160101_v01.cdf").content)
xr.DataArray(tha_l2_fgm['tha_fgl_gsm'], dims=['time', 'components'], coords={'time':tha_l2_fgm['tha_fgl_time'].values, 'components':['x', 'y', 'z']}).plot.line(x='time')
plt.show()

# Works with matplotlib directly too

plt.plot(tha_l2_fgm['tha_fgl_time'], tha_l2_fgm['tha_fgl_gsm'])
plt.show()

Datetimes handling:

import pycdfpp
import os
# Due to an issue with pybind11 you have to force your timezone to UTC for 
# datetime conversion (not necessary for numpy datetime64)
os.environ['TZ']='UTC'

mms2_fgm_srvy = pycdfpp.load("mms2_fgm_srvy_l2_20200201_v5.230.0.cdf")

# to convert any CDF variable holding any time type to python datetime:
epoch_dt = pycdfpp.to_datetime(mms2_fgm_srvy["Epoch"])

# same with numpy datetime64:
epoch_dt64 = pycdfpp.to_datetime64(mms2_fgm_srvy["Epoch"])

# note that using datetime64 is ~100x faster than datetime (~2ns/element on an average laptop)

Writing CDF files

Creating a basic CDF file:

import pycdfpp
import numpy as np
from datetime import datetime

cdf = pycdfpp.CDF()
cdf.add_attribute("some attribute", [[1,2,3], [datetime(2018,1,1), datetime(2018,1,2)], "hello\nworld"])
cdf.add_variable(f"some variable", values=np.ones((10),dtype=np.float64))
pycdfpp.save(cdf, "some_cdf.cdf")

C++

#include "cdf-io/cdf-io.hpp"
#include <iostream>

std::ostream& operator<<(std::ostream& os, const cdf::Variable::shape_t& shape)
{
    os << "(";
    for (auto i = 0; i < static_cast<int>(std::size(shape)) - 1; i++)
        os << shape[i] << ',';
    if (std::size(shape) >= 1)
        os << shape[std::size(shape) - 1];
    os << ")";
    return os;
}

int main(int argc, char** argv)
{
    auto path = std::string(DATA_PATH) + "/a_cdf.cdf";
    // cdf::io::load returns a optional<CDF>
    if (const auto my_cdf = cdf::io::load(path); my_cdf)
    {
        std::cout << "Attribute list:" << std::endl;
        for (const auto& [name, attribute] : my_cdf->attributes)
        {
            std::cout << "\t" << name << std::endl;
        }
        std::cout << "Variable list:" << std::endl;
        for (const auto& [name, variable] : my_cdf->variables)
        {
            std::cout << "\t" << name << " shape:" << variable.shape() << std::endl;
        }
        return 0;
    }
    return -1;
}

caveats

  • NRV variables shape, in order to expose a consistent shape, PyCDFpp exposes the reccord count as first dimension and thus its value will be either 0 or 1 (0 mean empty variable).