Anduin

Processing Large RDF Graphs on Hadoop

Anduin is a lightweight and concise tool to process RDF/N-Quads as well as RDF/NTriples formatted data using Hadoop. Anduin is written in Scala and built atop Scalding, a library from Twitter.

Current Version

0.3.1

Features

Support of RDF/N-Quads and RDF/NTriples formats
Tolerant to ill-formed RDF data
Gathering entity type statistics
Building adjacency matrices
Aggregating entity descriptions (e.g. for entity search)

Known Issues

There is no support of blank nodes at the moment.

Prerequisites

Java 1.6+
Scala 2.9.2+
tested on Apache Hadoop 1.1 as well as Amazon Web Services Elastic MapReduce

Mailing list

Have a question or a suggestion? Please join our mailing list.

anduin@googlegroups.com

Development and Contribution

Anduin has been developed by Nikita Zhiltsov. To add new functionality or fix existing bugs, feel free to contribute the patches via pull requests into the develop branch.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

nzhiltsov/Anduin