/native-hdfs-fuse

C HDFS FUSE implementation, no libhdfs

Primary LanguageCApache License 2.0Apache-2.0

Native HDFS Fuse Implementation

This program allows you to mount an HDFS via FUSE so it appears like a local filesystem.

Unlike other FUSE HDFS implementations this implementation doesn't use libhdfs or otherwise start a JVM - it constructs and sends the protocol buffer messages itself. The implementation supports random file writes too.

Usage

Compiling

Dependencies for Ubuntu (22.04)
sudo apt-get install -y pkgconf libfuse-dev libprotobuf-c-dev libprotobuf-dev protobuf-c-compiler uncrustify

Compile the program :

make && make install

This will compile and install the native-hdfs-fuse binary to /usr/bin. The build process needs the protoc-c protobuf compiler available and uses pkg-config to find the fuse and libprotobuf-c shared libraries it needs to link to.

You can make a debug build using make debug; this adds debug symbols to the binary and compiles in verbose logging statements.

Running

native-hdfs-fuse <namenode host> <namenode port, usually 8020> <other FUSE arguments, including mount directory>

Testing

Tested using fsx on a Hadoop Minicluster. Useful settings include setting the minicluster block size to a small value (e.g. passing -D dfs.block.size=4194304 as an argument to minicluster) and getting fsx to use a large file (e.g. by passing -F 134217728) to test the multi-block logic.

Contributing

Please contribute by submitting Github pull requests here.

Some missing features:

  • Support for HDFS encrypted transport.
  • Using HDFS' shared memory shortcut when reading blocks locally.
  • CRC32 (not CRC32C) checksumming.
  • Checksum validation when reading packets.
  • Data Node pipeline recovery.