/R2-RML-Toolkit

A modular R2RML suite built on Apache Jena. Featuring a complete domain API built on Jena's polymorphism system, SHACL validation, an R2RML processor with 100% standard conformance based an Jena's ARQ plus common tooling every R2RML project needs.

Primary LanguageJavaOtherNOASSERTION

A Jena-based (R2)RML API

A modular R2RML suite built on Apache Jena. Featuring a complete domain API built on Jena's polymorphism system, SHACL validation, an R2RML processor with 100% standard conformance based an Jena's ARQ plus common tooling every R2RML project needs.

R2RML Model API

The core of this project is formed by R2RML model classes that integrate directly with Apache Jena's polymorphism system. This means, you can write your application logic against ordinary Java interfaces. But under the hood their getters and setters actually read and write directly from/to an underlying RDF model.

<dependency>
  <groupId>org.aksw.r2rml</groupId>
  <artifactId>r2rml-jena-plugin</artifactId>
  <version><!-- Check the link below --></version>
</dependency>

List versions published on Maven Central

The following Java snippet demonstrates usage of the API:

public class R2rmlApiExample {
	public static void main(String[] args) {
		Model model = ModelFactory.createDefaultModel();
		model.setNsPrefix("rdfs", RDFS.uri);
		model.setNsPrefix("rr", RR.uri);
		
		TriplesMap triplesMap = model.createResource().as(TriplesMap.class); 
		triplesMap
			.setSubjectIri("urn:s")
			.addNewPredicateObjectMap()
				.addPredicate("urn:p")
				.addNewObjectMap()
					.setColumn("labels")
					.setLanguage("en");
		
		// All domain classes of the R2RML API *ARE* Jena Resources.
		// Hence, any information - such as types or custom attributes - can be freely attached:
		triplesMap
			.addProperty(RDF.type, RR.TriplesMap)
			.addProperty(RDFS.label, "My R2RML Mapping");
		
		RDFDataMgr.write(System.out, model, RDFFormat.TURTLE_PRETTY);
	}
}

The output in turtle syntax is shown below. Note, that any of the many serialization formats supported by Jena could be used instead.

@prefix rr:    <http://www.w3.org/ns/r2rml#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .

[ a                      rr:TriplesMap ;
  rdfs:label             "My R2RML Mapping" ;
  rr:predicateObjectMap  [ rr:objectMap  [ rr:column    "labels" ;
                                           rr:language  "en"
                                         ] ;
                           rr:predicate  <urn:p>
                         ] ;
  rr:subject             <urn:s>
] .

SPARQL Extensions in RML

The RML toolkit provides extensions for the use SPARQL expressions to

  • add computed 'reference names' (columns) using norse:rml.bind
  • filter RDF terms using norse:rml.filter

An norse:rml.bind expression can override an existing column once. The existing column becomes 'shadowed' by the new value, so all other references will then refer to the shadowed values.

Prefixes inside of these SPARQL expressions are specified using the SHACL vocabulary.

PREFIX rml: <http://semweb.mmlab.be/ns/rml#>

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX norse: <https://w3id.org/aksw/norse#>

_:prefixes
  sh:declare [ sh:prefix "xsd"  ; sh:namespace "http://www.w3.org/2001/XMLSchema#" ] ;
  sh:declare [ sh:prefix "geo"  ; sh:namespace "http://www.opengis.net/ont/geosparql#" ] ;
  sh:declare [ sh:prefix "geof" ; sh:namespace "http://www.opengis.net/def/function/geosparql/" ] ;
  .

<#AssetEmission>
  a rr:TriplesMap;
    rml:logicalSource [
      rml:source "asset_shipping_emissions_year.csv";
      rml:referenceFormulation ql:CSV ;
      sh:prefixes _:prefixes ;

      # 'Shadow' the references of ?start_time based on the expression below.
      # All rml:reference instances will refer to the shadowed value
      norse:rml.bind "xsd:dateTime(replace(?start_time, ' ', 'T')) AS ?start_time" ;
      norse:rml.bind "xsd:dateTime(replace(?end_time, ' ', 'T')) AS ?end_time" ;
      norse:rml.bind "geof:simplifyDp(strdt(?st_astext, geo:wktLiteral), 0.0001) AS ?st_astext" ;

      # Compute a new column
      norse:rml.bind "xsd:gYear(?start_time) AS ?year" ;
    ] ;
    rr:subjectMap [
      rr:template "https://data.coypu.org/ClimateTrace/{asset_id}-{iso3_country}-{gas}-{year}";
      rr:class coy:AssetEmission
    ] ;
    rr:predicateObjectMap [
      rr:predicate coy:hasAssetId;
      rr:objectMap [
        rml:reference "asset_id";
        rr:datatype xsd:string ;
        # Omit generation of this term (and thus the corresponding triples)
        # if the condition evaluates to boolean false.
        norse:rml.filter "?assert_id != ''" ;
      ]
    ] ;
    # ...
    .

RML to SPARQL Conversion

Since version 5.0.0 there is now the RmlToSparqlRewriteBuilder for translating RML to SPARQL.

<dependency>
  <groupId>org.aksw.rmltk</groupId>
  <artifactId>rml-jena-arq</artifactId>
</dependency>
RmlToSparqlRewriteBuilder builder = new RmlToSparqlRewriteBuilder()
  .setCache(cache)
  .addFnmlFiles(fnmlFiles)
  .addRmlFiles(inputFiles)
  .setDenormalize(denormalize)
  .setMerge(merge)
  ;

List<Entry<Query, String>> labeledQueries = builder.generate();

Jena Compatibility

r2rml-api jena
0.9.0 3.17.0
0.9.1 4.4.0
0.9.2 4.4.0
0.9.3 4.5.0
4.8.0-X 4.8.0

Starting with Jena 4.8.0 we aligned the version of this project with Jena to make it easier to determine the compatibility. For example, r2rml-jena-api version 4.8.0-2 indicates the second release developed against Jena 4.8.0.

Usage of the CLI Tool

Conversion of RML to SPARQL Construct Queries

rmltk rml to sparql mapping.rml.ttl > mapping.raw.rq
rmltk optimize workload mapping.raw.rq --no-order > mapping.rq

How to Execute the Mapping

The RDF processing toolkit (RPT) supports execution of the generated mapping. RPT uses this repository.

Using the single threaded Jena engine:

rpt integrate mapping.rq

Using RPT's parallel Spark-based executor:

rpt sansa query mapping.rq

Modules

How to Cite this Work

@inproceedings{kgcw2023sbmm,
  title={Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark},
  author={Stadler, Claus and B{\"u}hmann, Lorenz and Meyer, Lars-Peter and Martin, Michael},
  booktitle={KGCW2023, the 4th International Workshop on Knowledge Graph Construction},
  year={2023}
}

License

The source code and shacl specification of this repo is published under the Apache License Version 2.0.