- Clone this repository.
git clone https://github.com/JayjeetAtGithub/skyhook-perf-experiments /tmp/skyperf
cd /tmp/skyperf
git lfs pull- Install Ceph by running deploy.sh.
/tmp/skyperf/deployment_scripts/deploy.sh- Install SkyhookDM by running skyhook.sh. For example, to install SkyhookDM libraries on node1 to node4, do
/tmp/skyperf/deployment_scripts/skyhook.sh 1 4
- Populate data by running populate.sh like this. Refer to the stripe sizes for different sized files written at the end of the script. For example, to write 460 64MB files,
mkdir -p /mnt/cephfs/dataset
/tmp/skyperf/deployment_scripts/populate.sh /tmp/skyperf/datasets/64MB.parquet /mnt/cephfs/dataset/64MB.parquet 0 460 67108864Doing this will write 460 files with names, 64MB.parquet.0, 64MB.parquet.1, etc. with stripe unit of 67108864 bytes.
- Run benchmarks. With C++,
g++ /tmp/skyperf/benchmark_scripts/bench.cc -larrow -larrow_dataset -lparquet -o bench
./bench [format(pq/rpq)] [selectivity] file:///[/path/to/dataset]For example,
./bench rpq 100 file:///mnt/cephfs/dataset