Apache DataFu

Follow @apachedatafu

Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by the need for stable, well-tested libraries for data mining and statistics.

It consists of two libraries:

Apache DataFu Pig: a collection of user-defined functions for Apache Pig
Apache DataFu Hourglass: an incremental processing framework for Apache Hadoop in MapReduce

DataFu is currently undergoing incubation with Apache. A mirror of the official git repository can be found on GitHub at https://github.com/apache/incubator-datafu.

For more information please visit the website: