Apache Jena and Fuseki Playground

This repository serves as playground for checking out Apache Jena and Apache Fuseki. The repository provides

  • Dockerfiles for running an Apache Fuseki server
  • Dockerfiles for Apache Jena RDF I/O (RIOT)
  • Example data
  • Command line examples and Java code showing how to use Apache Jena and how to interact with Apache Fuseki

Setup

  1. Clone this repository

  2. Install Docker and Docker Compose

    # Install Docker Compose
    sudo apt-get update
    sudo apt-get install docker-compose-plugin
    docker compose version
    
  3. Build Apache Fuseki Docker container

    cd docker/jena-fuseki-docker-4.9.0
    sudo docker compose build --build-arg JENA_VERSION=4.9.0
    # Test the build
    # Start Fuseki with an in-memory, updatable dataset at http://host:3030/ds
    sudo docker compose run --rm --service-ports fuseki --mem /ds
    

    For more information how to use the container, check: https://jena.apache.org/documentation/fuseki2/fuseki-docker.html

  4. Install Java

    These commands work for Ubuntu 20.04

    sudo apt install openjdk-17-jre-headless
    export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/
    java --version
    # Should return:
    # openjdk 17.0.8 2023-07-18
    
  5. Install Apache Jena und Fuseki binaries

    Among others, this installs riot and sparql

    cd /opt
    wget https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.9.0.tar.gz
    wget https://dlcdn.apache.org/jena/binaries/apache-jena-4.9.0.tar.gz
    tar xzf apache-jena-fuseki-4.9.0.tar.gz
    tar xzf apache-jena-4.9.0.tar.gz
    

    Add the binaries to PATH:

    export PATH=/opt/apache-jena-4.9.0/bin/:/opt/apache-jena-fuseki-4.9.0/bin:$PATH
    

Resources

Working with RDF serialization formats

Jena provides tools for working with several RDF serialization formats:

  • riot: parse, guessing the syntax from the file extension
  • special parsers for particular languages (e.g. turtle for Turtle) An overview can be found in the Jena documentation

Validate

riot can be used to validate RDF serializations. It can be used to validate serializations generated by other programs, like the ProvToolbox examples at example_datasets/ProvToolbox_playground. The code that generated these examples is available in this repository.

riot --validate example_datasets/fmri_snakemake.jsonld

riot guesses the RDF format based on file extension. If the extension isn't supported by riot, one can set the format via --syntax=FORMAT:

# riot expects .rdf or .owl for the RDF/XML Format, but we have .xml:
riot --syntax=RDF/XML --validate example_datasets/ProvToolbox_playground/fmri_provenance.xml

Converting

riot can convert RDF serialization formats. This command generates N-Triples from a Turtle file.

riot --output=Turtle example_datasets/ProvToolbox_playground/fmri_provenance.ttl

Inference

riot supports creation of inferred triples during the parsing process. Output will contain the base data and triples inferred based on RDF subclass, subproperty, domain and range declarations.

We take the PROV onthology and our example data from ProvToolbox as example: The onthology has to be in RDF/XML format. The filename has to end with .rdf.

wget http://www.w3.org/ns/prov.owl
mv prov.owl prov.rdf

Now we can use he onthology for inference:

riot --rdfs=example_datasets/onthologies/prov.rdf example_datasets/ProvToolbox_playground/fmri_provenance.ttl

These statements are present in fmri_provenance.ttl:

fmri:warp1 a prov:Entity ;
	rdfs:label "warp1" .
fmri:align-warp4 a prov:Activity ;
	rdfs:label "align_warp4" .
fmri:warp4 prov:wasGeneratedBy fmri:align-warp4 .

The PROV onthology defines the Property wasGeneratedBy as:

<owl:ObjectProperty rdf:about="http://www.w3.org/ns/prov#wasGeneratedBy">
        <rdfs:label>wasGeneratedBy</rdfs:label>
        <inverse>generated</inverse>
[...]
        <rdfs:subPropertyOf rdf:resource="http://www.w3.org/ns/prov#wasInfluencedBy"/>
[...]
    </owl:ObjectProperty>

The above riot command outputs the inferred triples:

<https://example.com/warp4> <http://www.w3.org/ns/prov#wasInfluencedBy> <https://example.com/align-warp4> .

riot infers subproperties. As wasGeneratedBy is a subproperty of wasInfluenceBy, this triple is inferred. There are other inference engines that can infer more triples. You may notice that the reverse property generated is not inferred, because riot does not infer inverse properties.

Working with Apache Fuseki

Load a TDB2 database, and expose, read-only, via Docker:

#cd docker/jena-fuseki-docker-4.9.0
mkdir -p docker/jena-fuseki-docker-4.9.0/databases/DB2
sudo docker run \
  -it \
  -v "$(pwd)"/example_datasets:/input \
  -v "$(pwd)"/docker/jena-fuseki-docker-4.9.0/databases:/databases \
  jena:latest \
  tdb2.tdbloader --loc /databases/DB2 /input/fMRI_example_full/fmri_provenance.ttl

04:45:31 INFO  loader          :: Loader = LoaderPhased
04:45:31 INFO  loader          :: Start: /input/ProvToolbox_playground/fmri_provenance.ttl
04:45:31 INFO  loader          :: Finished: /input/ProvToolbox_playground/fmri_provenance.ttl: 141 tuples in 0.21s (Avg: 658)
04:45:31 INFO  loader          :: Finish - index SPO
04:45:31 INFO  loader          :: Start replay index SPO
04:45:31 INFO  loader          :: Index set:  SPO => SPO->POS, SPO->OSP
04:45:31 INFO  loader          :: Index set:  SPO => SPO->POS, SPO->OSP [141 items, 0.0 seconds]
04:45:31 INFO  loader          :: Finish - index OSP
04:45:31 INFO  loader          :: Finish - index POS

Start the Fuseki server and expose the database read-only:

cd docker/jena-fuseki-docker-4.9.0
# Without chaning permissions fuseki won't start. This is ugly, but in the end we just want to test things ...
sudo chmod -R 777 databases/

#sudo docker compose run --rm --name MyServer --service-ports fuseki --loc databases/DB2 /ds
sudo docker compose up
[2023-12-06 05:31:28] INFO  Server          :: Apache Jena Fuseki 4.9.0
[2023-12-06 05:31:29] INFO  Server          :: Database: TDB2 dataset: location=databases/DB2
[2023-12-06 05:31:29] INFO  Server          :: Path = /ds
[2023-12-06 05:31:29] INFO  Server          ::   Memory: 2.0 GiB
[2023-12-06 05:31:29] INFO  Server          ::   Java:   17.0.9
[2023-12-06 05:31:29] INFO  Server          ::   OS:     Linux 5.15.0-89-generic amd64
[2023-12-06 05:31:29] INFO  Server          ::   PID:    1
[2023-12-06 05:31:29] INFO  Server          :: Start Fuseki (http=3030)

For writing to the database, use the --update flag.

Interact with the Fuseki server

Retrieve data

s-get http://localhost:3030/ds/data default

Make Queries

This return every stored triple:

s-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}'

More interesting Queries on the fMRI provenance data

An initial set of provenance-related queries is given below.

Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.
Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

Provenance Queries

  1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.

The URI of the Atlas X Graphic in the example is fmri:convert-x. Let's get some information about that file. The next query shows every triple where fmri:convert-x is the subject`

s-query --service http://localhost:3030/ds/query \
     'PREFIX schema:  <https://schema.org/>
      PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
      PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
      PREFIX dcterms: <http://purl.org/dc/terms/>
      PREFIX prov: <http://www.w3.org/ns/prov#>
      PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      PREFIX fmri: <https://example.com/>
      PREFIX scoro: <http://purl.org/spar/scoro/>
      SELECT ?s ?p ?o
      WHERE {fmri:convert-x ?p ?o .}'

{ "head": {
    "vars": [ "s" , "p" , "o" ]
  } ,
  "results": {
    "bindings": [
      { 
        "p": { "type": "uri" , "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } ,
        "o": { "type": "uri" , "value": "http://purl.dataone.org/provone/2015/01/15/ontology#Data" }
      } ,
      { 
        "p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#label" } ,
        "o": { "type": "literal" , "value": "atlas_x.gif" }
      } ,
      { 
        "p": { "type": "uri" , "value": "https://schema.org/location" } ,
        "o": { "type": "literal" , "value": "~/github/fMRI_snakemake/results/slices/atlas_x.gif" }
      } ,
      { 
        "p": { "type": "uri" , "value": "https://schema.org/sha256" } ,
        "o": { "type": "literal" , "value": "882fe0286e2fae0c5c1e9f3420bb7da6004a6900c96ef8f018c2c44a7c8b
0a1a" }
      } ,
      { 
        "p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#qualifiedGeneration" } ,
        "o": { "type": "bnode" , "value": "b0" }
      }
    ]
  }
}

We can see that fmri:convert-x is the subject for a prov:qualifiedGeneration predicate. We can get more information about the Generation of fmri:convert-x by looking at the object of that triple:

s-query --service http://localhost:3030/ds/query \
    'PREFIX rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     PREFIX schema:  <https://schema.org/>
     PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
     PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
     PREFIX dcterms: <http://purl.org/dc/terms/>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     PREFIX fmri: <https://example.com/>
     PREFIX scoro: <http://purl.org/spar/scoro/>
     SELECT ?generation ?p ?o
     WHERE {fmri:convert-x prov:qualifiedGeneration ?generation .
            ?generation ?p ?o .}'

{ "head": {
    "vars": [ "generation" , "p" , "o" ]
  } ,
  "results": {
    "bindings": [
      { 
        "generation": { "type": "bnode" , "value": "b0" } ,
        "p": { "type": "uri" , "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } ,
        "o": { "type": "uri" , "value": "http://www.w3.org/ns/prov#Generation" }
      } ,
      { 
        "generation": { "type": "bnode" , "value": "b0" } ,
        "p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#activity" } ,
        "o": { "type": "uri" , "value": "https://example.com/convert-exe" }
      }
    ]
  }
}

Now we have the activity convert-exe that created our file fmri:convert-x. Let's get details about the program used in that activity:

s-query --service http://localhost:3030/ds/query \
    'PREFIX rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     PREFIX schema:  <https://schema.org/>
     PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
     PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
     PREFIX dcterms: <http://purl.org/dc/terms/>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     PREFIX fmri: <https://example.com/>
     PREFIX scoro: <http://purl.org/spar/scoro/>
     SELECT ?program ?p ?o
     WHERE {fmri:convert-x prov:qualifiedGeneration ?generation .
            ?generation prov:activity ?execution .
            ?execution prov:qualifiedAssociation ?association .
            ?association prov:hadPlan ?program .
            ?program ?p ?o}'
{ "head": {
    "vars": [ "program" , "p" , "o" ]
  } ,
  "results": {
    "bindings": [
      { 
        "program": { "type": "uri" , "value": "https://example.com/convert" } ,
        "p": { "type": "uri" , "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } ,
        "o": { "type": "uri" , "value": "http://purl.dataone.org/provone/2015/01/15/ontology#Program" }
      } ,
      { 
        "program": { "type": "uri" , "value": "https://example.com/convert" } ,
        "p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#label" } ,
        "o": { "type": "literal" , "value": "convert" }
      } ,
      { 
        "program": { "type": "uri" , "value": "https://example.com/convert" } ,
        "p": { "type": "uri" , "value": "https://schema.org/applicationSuite" } ,
        "o": { "type": "literal" , "datatype": "https://schema.org/Text" , "value": "ImageMagick" }
      } ,
      { 
        "program": { "type": "uri" , "value": "https://example.com/convert" } ,
        "p": { "type": "uri" , "value": "https://schema.org/citation" } ,
        "o": { "type": "literal" , "datatype": "https://schema.org/URL" , "value": "ImageMagick Studio LL
C. (2023). ImageMagick. Retrieved from https://imagemagick.org" }
      } ,
      { 
        "program": { "type": "uri" , "value": "https://example.com/convert" } ,
        "p": { "type": "uri" , "value": "https://schema.org/downloadUrl" } ,
        "o": { "type": "literal" , "datatype": "https://schema.org/URL" , "value": "https://legacy.imagem
agick.org/script/download.php" }
      } ,
      { 
        "program": { "type": "uri" , "value": "https://example.com/convert" } ,
        "p": { "type": "uri" , "value": "https://schema.org/softwareVersion" } ,
        "o": { "type": "literal" , "datatype": "https://schema.org/Text" , "value": "6.9.10-23 Q16 x86_64
 20190101" }
      }
    ]
  }
}

This shorter query should create a whole subgraph. The subgraph contains all information about the generation of fmri:convert-x.

From: https://stackoverflow.com/questions/37186530/how-do-i-construct-get-the-whole-sub-graph-from-a-given-resource-in-rdf-graph

s-query --service http://localhost:3030/ds/query \
    'PREFIX schema:  <https://schema.org/>
     PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
     PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
     PREFIX dcterms: <http://purl.org/dc/terms/>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     PREFIX fmri: <https://example.com/>
     PREFIX scoro: <http://purl.org/spar/scoro/>
     CONSTRUCT { ?s ?p ?o }
     WHERE {fmri:convert-x (prov:label|!prov:label)* ?s . ?s ?p ?o .}'

SPARQL Queries on RDF

SPARQL queries can be performed from the command line:

# Example: https://jena.apache.org/tutorials/sparql_query1.html
sparql --data=vc-db-1.rdf --query=q1.rq

--------------------------------
| x                            |
================================
| <http://somewhere/JohnSmith> |
--------------------------------