POCache is a parity-only caching design that provides robust straggler tolerance. It is a prototype atop Hadoop 3.1 HDFS, while preserving the performance and functionalities of normal HDFS operations. To limit the erasure coding overhead, POCache slices blocks into smaller subblocks and parallelizes the coding operations at the subblock level. Also, it leverages a straggler-aware cache algorithm that takes into account both file access popularity and straggler estimation to decide what parity blocks should be cached.
Environment:
- Ubuntu 16.04
- JDK1.8.0_151
$ sudo apt-get install maven
Download and install ISA-L following https://github.com/01org/isa-l
As Hadoop project requires protoc 2.5.0, please compile it by yourself:
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar zxvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./autogen.sh
./configure
make
Add src/protoc into your path so that the building procedure can find it.
$ mvn package -DskipTests -Dtar -Dmaven.javadoc.skip=true -Drequire.isal -Pdist,native -DskipShade -e
Please install HDFS packages in the maven repo first in order to build pocache-dfs-perf with the modified HDFS client.
$ mvn install -DskipTests -Dtar -Dmaven.javadoc.skip=true -Drequire.isal -Pdist,native -DskipShade -e
$ cd pocache-dfs-perf
$ mvn install
The documents of pocache-dfs-perf can be found under pocache-dfs-perf/docs
- Distribute the releases of POCache and pocache-dfs-perf to the cluster. Make sure to distribute the release using rsync with all client nodes.
- Configure and distribute the configuration files in pocache_conf to the hadoop configuration files.
- Run Hadoop following the tutorial of the official document.
- Use pocache-dfs-perf to benchmark POCache following the tutorial in pocache-dfs-perf.
Note that pocache-dfs-perf leverage vmtouch to disable page cache (by cleaning page cache before each read request) for a fair comparison environment.
Please email to Mi Zhang (mzhang@cse.cuhk.edu.hk) if you have any questions.
Mi Zhang, Qiuping Wang, Zhirong Shen, and Patrick P. C. Lee.
"Parity-Only Caching for Robust Straggler Tolerance", MSST 2019