/takane

Validators for ArtifactDB file formats.

Primary LanguageC++MIT LicenseMIT

File validators for Bioconductor objects

Unit tests Documentation codecov

Overview

This library contains some C++ libraries to validate on-disk representations of Bioconductor objects used in ArtifactDB instances. The idea is to provide a cross-language method for validating the files - which is not quite as useful as a library for reading the files, but it's better than nothing.

Specifications

See general comments for all objects' on-disk representations.

Currently, takane provides validators for the following objects:

  • atomic_vector_list: 1.0.
  • atomic_vector: 1.0.
  • bam_file: 1.0.
  • bcf_file: 1.0.
  • bed_file: 1.0.
  • bigbed_file: 1.0.
  • bigwig_file: 1.0.
  • bumpy_atomic_array: 1.0.
  • bumpy_data_frame_array: 1.0.
  • compressed_sparse_matrix: 1.0.
  • data_frame_factor: 1.0.
  • data_frame_list: 1.0.
  • data_frame: 1.0.
  • dense_array: 1.0.
  • fasta_file: 1.0.
  • fastq_file: 1.0.
  • genomic_ranges_list: 1.0.
  • genomic_ranges: 1.0.
  • gff_file: 1.0.
  • gmt_file: 1.0.
  • multi_sample_dataset: 1.0.
  • ranged_summarized_experiment: 1.0.
  • sequence_information: 1.0.
  • sequence_string_set: 1.0.
  • simple_list: 1.0, 1.1.
  • single_cell_experiment: 1.0.
  • spatial_experiment: 1.0, 1.1, 1.2.
  • string_factor: 1.0.
  • summarized_experiment: 1.0.
  • vcf_experiment: 1.0.

Validation

The takane::validate() function inspects the object's directory and validates its contents, throwing an error if the contents are not valid.

#include "takane/takane.hpp"

takane::validate(dir);

The idea is to bind to the takane library in application-specific frameworks, e.g., via R/Python's foreign function interfaces. This consistently enforces the format expectations for each object, regardless of how the saving was performed by each application. For example, we might use the alabaster framework to save Bioconductor objects to disk:

library(alabaster.base)
tmp <- tempfile()
df <- DataFrame(X=1:10, Y=letters[1:10])
saveObject(df, tmp)
validateObject(tmp) # calls takane::validate()

If the validation passes, we can be confident that the same object can be reconstructed in different frameworks, e.g., with dolomite packages in Python.

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)

FetchContent_Declare(
  takane 
  GIT_REPOSITORY https://github.com/ArtifactDB/takane
  GIT_TAG master # or any version of interest
)

FetchContent_MakeAvailable(takane)

Then you can link to takane to make the headers available during compilation:

# For executables:
target_link_libraries(myexe takane)

# For libaries
target_link_libraries(mylib INTERFACE takane)

CMake with find_package()

You can install the library by cloning a suitable version of this repository and running the following commands:

mkdir build && cd build
cmake .. -DTAKANE_TESTS=OFF
cmake --build . --target install

Then you can use find_package() as usual:

find_package(artifactdb_takane CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE artifactdb::takane)

Manual

If you're not using CMake, the simple approach is to just copy the files in the include/ subdirectory - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. You will also need to link to the dependencies listed in the extern/CMakeLists.txt directory along with the HDF5 library.

Further comments

This library is named after Takane Shijou, continuing my trend of naming C++ libraries after iDOLM@STER characters. Not really sure why I picked Takane but she's nice enough.

Takane GIF