/kanaval

Specification-as-code validator for the kana format.

Primary LanguageC++MIT LicenseMIT

Validating kana files

Overview

The kana file format contains an embedded HDF5 file that captures the analysis state of the kana application. This embedded state file stores the parameters and results for each step in a simple single-cell RNA-seq analysis. By storing this state, we can easily reload existing analyses into kana without recomputation. It is also straightforward to extract results from this state in other data analysis frameworks (e.g., R/Bioconductor).

The kanaval repository contains a specification of the expected structure and content of the state file. We use a specification-as-code approach that enforces the specification with a validator library, implemented with header-only C++ for portablity to any system that supports a foreign function interface. It is thus possible to create kana files from other languages, validate them, and upload them to kana.

Format specification

The first 8 bytes define an unsigned 64-bit integer in little-endian, specifying the format type. This is used to denote whether the input data files are embedded (0) or linked (1); the former is used for export to a standalone file while the latter is used to save the state to the browser's cache.

The next 8 bytes define another unsigned 64-bit integer describing the format version. We use semantic versioning where each version number is described by 3 digits, i.e., XXXYYYZZZ.

The next 8 bytes define another unsigned 64-bit integer specifying the size of the HDF5 file containing the analysis state. Let's call this value state_nbytes.

The next state_nbytes bytes contain a HDF5 state file. Each analysis step is represented by a HDF5 group that contains the parameters and results. See the next section for details on the expected groups.

The remaining bytes contain the embedded input files when dealing with an embedded format type. Each file can be excised by reading the offsets and sizes in the inputs group in the state file.

Structure of the HDF5 state file

Inside the HDF5 state file, each analysis step is represented by a HDF5 group.

Version 3.0:

File metadata is stored in its own group.

Version 2.1:

Version 1.2:

Running the validator

Calling the validate() function will validate the state file, which will throw a reasonably informative error if there are any problems.

#include "H5Cpp.h"
#include "kanaval/validate.hpp"

H5::H5File handle(path, H5F_ACC_RDONLY);
kanaval::validate(handle, embedded, version);