A framework for predicting compression ratios for lossy compressors and other key metrics in a generic way using LibPressio
You can use the tool pressio_predict_bench
to evaluate your prediction method on a dataset of your choice. Here is an example of running the tool the 13 fields of the Hurricane dataset from SDRBench with the method average_sampled
:
./build/pressio_predict_bench \
-a dataset \
-a settings \
-a run \
-a score \
-L pressio:loader=folder \
-l folder:base_dir=$HOME/git/datasets/hurricane/100x500x500/ \
-l io_loader:dims=500 \
-l io_loader:dims=500 \
-l io_loader:dims=100 \
-l io_loader:dtype=float \
-l io_loader:use_template=true \
-l folder:regex='.+/([A-Z]+)f(\d+).bin.f32' \
-l folder:groups=field \
-l folder:groups=timestep \
-b pressio:compressor=sz3 \
-b pressio:metric=composite \
-b composite:plugins=size \
-b composite:plugins=time \
-b composite:plugins=error_stat \
-o pressio:abs=1e-5
It will compute the median absolute percentage error. If built with MPI support, this command will run in parallel on a cluster. If your metrics are subject to interference, run in isolation mode with the -Z isolate
flag.
You can use the prediction schemes from C/C++ using the following API pattern
TODO EXAMPLE
-
Build the code, then run the provided tool
pressio_new metric $name > ./src/plugins/predictors/$name.cc
from LibPressioTools to add a new predictor from a metrics template. -
After the code is generated, edit the resulting file in
src/plugins/predictors/$name.cc
to add your prediction method. You likely just need to edit thebegin_compress_impl
function to gather the metrics you need. You can see the example insrc/plugins/predictors/tiled_samples.cc
for an example that invokes the compressor to produces an estimate. -
Next in
get_configuration
add an entrypredictors:invalidate
with astd::vector<std::string>
containing the list of options to a compressor that would invalidate this predictor calculation.- a special value of
predictors:nondeterministic
indicates that this metric is is non-deterministic (e.g. Randomized SVD) even if the underlying compressor is. - a special value of
predictors:runtime
indicates that this metric is is dependent on performance related characteristics (e.g. time:compress) - a special value of
predictors:error_dependent
indicates that this metric is invalidated whenever whenever any setting that effects the quality of the data after compression is changed (e.g. the quantized entropy) - a special value of
predictors:error_agnostic
indicates that this metric is independent of the configuration of any compressor (e.g. the standard deviation of the input data) - if any other compressor setting (e.g.
pressio:abs
orsz:quantization_intervals
) is provided in this list, the compressor for which predictions are being made MUST implement this metric for this predictor to use this predictor. If the compressor does not implement the metric MAY return an appropriate error.
- a special value of
-
After you have the predictors used to produce the estimates, run
pressio_new estimator
to generate a estimator class from a template that combines these estimates into an estimation of the metric of interest.
TODO EXAMPLE
-
Edit the CMakeLists.txt code to add dependencies for your new predictor/estimator.
-
Build your new predictor and/or estimator modules, and ensure that the automated unit tests pass.
-
You can test a metric as follows (in this case
tao2019
withsz3
):
pressio -i ~/git/datasets/hurricane/100x500x500/CLOUDf48.bin.f32 -d 500 -d 500 -d 100 -t float pressio -m time -b time:metric=tao2019 -M all -D path/to/liblibpressio_predict.so -b pressio:compressor=sz3
libpressio-predict
is best installed via spack.
git clone https://github.com/spack/spack
git clone https://github.com/robertu94/spack_packages robertu94_packages
source ./spack/share/spack/setup-env.sh
spack repo add ./robertu94_packages
spack install libpressio-predict+bin
The easiest way to do a development build of libpressio is to use Spack envionments.
# one time setup: create an envionment
spack env create -d mydevenviroment
spack env activate mydevenvionment
# one time setup: tell spack to set LD_LIBRARY_PATH with the spack envionment's library paths
spack config add modules:prefix_inspections:lib64:[LD_LIBRARY_PATH]
spack config add modules:prefix_inspections:lib:[LD_LIBRARY_PATH]
# one time setup: install libpressio-tools and checkout
# libpressio for development
spack add libpressio-predict+bin
spack develop libpressio-predict@git.master
# compile and install (repeat as needed)
spack install
Libpressio-Predict unconditionally requires:
cmake
pkg-config
libpressio
Dependency versions and optional dependencies are documented in the spack package.
Please refer to docs/stability.md.
Please refer to CONTRIBUTORS.md for a list of contributors, sponsors, and contribution guidelines.
Please files bugs to the Github Issues page on the robertu94 libpressio-predict repository.
Please read this post on how to file a good bug report. After reading this post, please provide the following information specific to libpressio-predict:
- Your OS version and distribution information, usually this can be found in
/etc/os-release
- the output of
cmake -L $BUILD_DIR
- the version of each of libpressio-predicts's dependencies listed in the README that you have installed. Where possible, please provide the commit hashes.
We hope to publish a paper on LibPressioPredict soon. Until then please cite LibPressio and the underlying prediction method which should have a citation in the pressio:description
@inproceedings{underwood2021productive,
title={Productive and Performant Generic Lossy Data Compression with LibPressio},
author={Underwood, Robert and Malvoso, Victoriana and Calhoun, Jon C and Di, Sheng and Cappello, Franck},
booktitle={2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7)},
pages={1--10},
year={2021},
organization={IEEE}
}
We've achieved reproducabilty materials for our prior efforts in the reproduceability folder by the first author name and year.