/py-hdfs-mount

Mount HDFS with fuse, works with kerberos!

Primary LanguagePython

Requirements

Python 3

Install

sudo apt-get install fuse libfuse2
pip3 install -r requirements.txt

If you will be using kerberos, install libkrb5-dev:

sudo apt-get install libkrb5-dev

Configuration

cp example.config.yaml config.yaml
$EDITOR config.yaml

Running

If you are using kerberos, run a kinit:

kinit -kt $USER $USER@REALM

In all cases you then will have to create a new empty directory that with be the mount point:

mkdir /mnt/dest_mount

And finaly you can run py-hdfs-fuse:

python3 hdfs_mount.py [--loglevel LEVEL] config.yaml

Have fun!

Note: if anything goes wrong and you have to kill py-hdfs-mount, you will probably have to run this command on the mounted folder to unlock it:

fusermount -u /mnt/dest_mount
umount -l /mnt/dest_mount

Tested with

  • Vim (open file, edit randomly, save and close)
  • cp/mv

Functionnalities

  • Cached writes (HDFS is an immutable FS (so writes=delete+insert))
  • Random writes (slow - because of the immutability of HDFS - but working!)
  • Very fast ls (cached directory metadata)
  • directory stored as a zip file in HDFS (to solve small files problem)
  • directory stored as a avro file in HDFS (to solve small files problem)
  • CRC32 checksum
  • Load options from configuration file

Implemented FUSE methods

Basic

  • access
  • chmod
  • chown
  • getattr
  • readdir
  • readlink
  • mknod
  • rmdir
  • mkdir
  • statfs
  • unlink
  • symlink
  • rename
  • link
  • utimens

File methods

  • open
  • create
  • read
  • write (caching is done in memory)
  • truncate
  • flush (writes the in memory written chunks to a temporary file in the local FS in the right order and calls fsync)
  • fsync (send the temporary file to HDFS)
  • release