/rdf-exp

Primary LanguageC++

A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data

Overview:

The versatility of the Resource Description Framework (RDF) has allowed many web services to publish very large datasets that are impractical to process on a single computer. Therefore, many distributed SPARQL engines on shared-nothing computer clusters have emerged. Some utilize distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive pre-processing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 21 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing, categorize them by several characteristics, and explain their similarities and differences. Then, we select 11 representative systems and perform extensive experimental evaluation with respect to pre-processing cost, query performance, scalability and workload adaptability, using a variety of synthetic and real large datasets with up to 4.2B triples. Our results provide valuable insights for practitioners to understand the trade-offs for their usage scenarios. Finally, we publish online our evaluation framework, including all datasets and workloads, for researchers to compare their novel systems against the existing ones.

Please see our technical report for details.

Dataset Statistics:

alt tag

Download Links

Benchmark Queries

All queries used in our experimental evaluation exists in #queries# folder including the individual benchmark queries or the query workloads.

Individual Queries

Workloads

Tested Systems:

System Download
AdPart https://github.com/razen-alharbi/AdPart
TriAD Contact Author: mailto:gurajada@mpi-inf.mpg.de
gStoreD https://github.com/bnu05pp/gStoreD
SHAPE https://sites.google.com/site/gtshape/
DREAM https://github.com/CMU-Q/DREAM
H2RDF+ https://github.com/zcourts/h2rdf/tree/master/H2RDF%2Bv0.2
S2RDF http://dbis.informatik.uni-freiburg.de/forschung/projekte/DiPoS/S2RDF.html
S2X http://dbis.informatik.uni-freiburg.de/forschung/projekte/DiPoS/S2X.html
CliqueSquare https://team.inria.fr/oak/projects/cliquesquare/
SHARD https://svn.code.sf.net/p/shard-3store/code/
H-RDF-3X Contact Author: jiewen.huang@yale.edu

Utilities

RDF Data Encoder

QueryLoad Encoder