/hadoofus

HDFS client library for C

Primary LanguageCMIT LicenseMIT

hadoofus Build Status (travis) Build Status (cirrus)

JRE-free multi-version HDFS client library

The hadoofus project is an HDFS (Hadoop Distributed File System) client library. Like Snakebite, it does not require Java. Unlike Snakebite, hadoofus is implemented in C. It supports RPC pipelining and out-of-order execution.

It provides a C API for directly calling Namenode RPCs and performing Datanode block read and write operations.

Unlike libhdfs, Hadoofus speaks multiple versions of the HDFS protocol. At your option, you may speak with Hadoop 0.20.203 through 1.x.y (HDFS_NN_v1 / HDFS_DATANODE_AP_1_0), Hadoop 2.0.x (HDFS_NN_v2 / HDFS_DATANODE_AP_2_0), or Hadoop 2.2.x and higher (HDFS_NN_v2_2 / HDFS_DATANODE_AP_2_0).

Using libhadoofus

#include <hadoofus/highlevel.h>

int64_t res;
const char *err = NULL;
struct hdfs_namenode *h;
struct hdfs_object *exception = NULL;

h = hdfs_namenode_new("host.bar.com", "8020", "hdfs", &err);
if (!h)
        ... ;

res = hdfs_getProtocolVersion(h, HADOOFUS_CLIENT_PROTOCOL_STR, 61L, &exception);
if (exception) {
        // fprintf(, "...%s...", hdfs_exception_get_message(exception));
        hdfs_object_free(exception);
        ... ;
}

if (res != 61)
        ... ;

hdfs_namenode_delete(h);

Caveats

Some less common RPCs provided by the Hadoop ClientProtocol interface in v2.x of the protocol are not yet implemented (see Issue #29).

Note: Hadoop has been known to change semantics slightly between different versions of the software (especially before v2 was released). The v1 protocol has no spec; we do the best we can.

Some RPCs that exist in HDFSv1 do not exist in HDFSv2+ — e.g. getProtocolVersion does not exist in v2.

The Datanode API is somewhat fragile.

HDFS Semantics

HDFS attempts to be a restricted subset of a POSIX filesystem.

Files can only have one writer at a time and do not support random writes. They can be appended but not overwritten in place.

Generally, the Namenode acts as an RPC server for querying and manipulating file system metadata. It points clients at Datanode(s) to read/write file data.

For more information, see wikipedia's article on Hadoop or the HDFS Architecture Guide.

Installing

See INSTALLING.md.

Issues

Found a bug? Please file it on github. Thanks!

Contributors

License

Unless otherwise noted, files in this source distribution are released under the terms of the MIT license. Some files used for CRC32C support come from elsewhere have non-MIT, but similarly permissive licenses (namely the BSD 2-clause license and Mark Adler's license, which he also uses for zlib), which are clearly specified at the top of their files (Some files which are not compiled into installed binaries or otherwise installed by this package's Makefiles come from the Apache Hadoop sources and have different licenses. These licenses are clearly specified at the beginning of the files.) For the full text of the MIT license, see the file LICENSE included with this source distribution.