This program allows you to mount an HDFS via FUSE so it appears like a local filesystem.
Unlike other FUSE HDFS implementations this implementation doesn't use libhdfs or otherwise start a JVM - it constructs and sends the protocol buffer messages itself. The implementation supports random file writes too.
Dependencies for Ubuntu (22.04)
sudo apt-get install -y pkgconf libfuse-dev libprotobuf-c-dev libprotobuf-dev protobuf-c-compiler uncrustify
Compile the program :
make && make install
This will compile and install the native-hdfs-fuse binary to /usr/bin. The build process needs the protoc-c protobuf compiler available and uses pkg-config to find the fuse and libprotobuf-c shared libraries it needs to link to.
You can make a debug build using make debug; this adds debug symbols to the binary and compiles in verbose logging statements.
native-hdfs-fuse <namenode host> <namenode port, usually 8020> <other FUSE arguments, including mount directory>
Tested using fsx on a Hadoop Minicluster. Useful settings include setting the minicluster block size to a small value (e.g. passing -D dfs.block.size=4194304 as an argument to minicluster) and getting fsx to use a large file (e.g. by passing -F 134217728) to test the multi-block logic.
Please contribute by submitting Github pull requests here.
Some missing features:
- Support for HDFS encrypted transport.
- Using HDFS' shared memory shortcut when reading blocks locally.
- CRC32 (not CRC32C) checksumming.
- Checksum validation when reading packets.
- Data Node pipeline recovery.