Falsa is a tool for generating H2O db-like-benchmark. This implementation is unofficial! For the official implementation please check DuckDB fork of orginial H2O project.
Falsa is built via maturin and pyo3. It works with python 3.9+. For maturin installation please follow an official documentation.
In virtualenv with python 3.9+:
maturin develop --release
falsa --help
In virtualenv with python 3.9+:
pip install git+https://github.com/mrpowers-io/falsa.git@main
falsa --help
At the moment the following output formats are supported:
- CSV
- Parquet
- Delta*
*There is a problem with Delta at the moment: writing to Delta requires materialization of all pyarrow
batches first and may be slow and tends to OOM-like errors. We are working on it now and will provide a patched version soon.