/Anduin

A Scala library to process RDF data on Hadoop

Primary LanguageScala

Anduin

Processing Large RDF Graphs on Hadoop

Anduin is a lightweight and concise tool to process RDF/N-Quads as well as RDF/NTriples formatted data using Hadoop. Anduin is written in Scala and built atop Scalding, a library from Twitter.

Current Version

0.3.1

Features

  • Support of RDF/N-Quads and RDF/NTriples formats
  • Tolerant to ill-formed RDF data
  • Gathering entity type statistics
  • Building adjacency matrices
  • Aggregating entity descriptions (e.g. for entity search)

Known Issues

There is no support of blank nodes at the moment.

Prerequisites

  • Java 1.6+
  • Scala 2.9.2+
  • tested on Apache Hadoop 1.1 as well as Amazon Web Services Elastic MapReduce

Mailing list

Have a question or a suggestion? Please join our mailing list.

anduin@googlegroups.com

Development and Contribution

Anduin has been developed by Nikita Zhiltsov. To add new functionality or fix existing bugs, feel free to contribute the patches via pull requests into the develop branch.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0