This repository serves as playground for checking out Apache Jena and Apache Fuseki. The repository provides
- Dockerfiles for running an Apache Fuseki server
- Dockerfiles for Apache Jena RDF I/O (RIOT)
- Example data
- Command line examples and Java code showing how to use Apache Jena and how to interact with Apache Fuseki
-
Clone this repository
-
Install Docker and Docker Compose
# Install Docker Compose sudo apt-get update sudo apt-get install docker-compose-plugin docker compose version
-
Build Apache Fuseki Docker container
cd docker/jena-fuseki-docker-4.9.0 sudo docker compose build --build-arg JENA_VERSION=4.9.0 # Test the build # Start Fuseki with an in-memory, updatable dataset at http://host:3030/ds sudo docker compose run --rm --service-ports fuseki --mem /ds
For more information how to use the container, check: https://jena.apache.org/documentation/fuseki2/fuseki-docker.html
-
Install Java
These commands work for Ubuntu 20.04
sudo apt install openjdk-17-jre-headless export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/ java --version # Should return: # openjdk 17.0.8 2023-07-18
-
Install Apache Jena und Fuseki binaries
Among others, this installs
riot
andsparql
cd /opt wget https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.9.0.tar.gz wget https://dlcdn.apache.org/jena/binaries/apache-jena-4.9.0.tar.gz tar xzf apache-jena-fuseki-4.9.0.tar.gz tar xzf apache-jena-4.9.0.tar.gz
Add the binaries to
PATH
:export PATH=/opt/apache-jena-4.9.0/bin/:/opt/apache-jena-fuseki-4.9.0/bin:$PATH
Jena provides tools for working with several RDF serialization formats:
riot
: parse, guessing the syntax from the file extension- special parsers for particular languages (e.g.
turtle
for Turtle) An overview can be found in the Jena documentation
riot
can be used to validate RDF serializations. It can be used to validate serializations generated by other programs, like the ProvToolbox examples at example_datasets/ProvToolbox_playground
. The code that generated these examples is available in this repository.
riot --validate example_datasets/fmri_snakemake.jsonld
riot
guesses the RDF format based on file extension. If the extension isn't supported by riot
, one can set the format via --syntax=FORMAT
:
# riot expects .rdf or .owl for the RDF/XML Format, but we have .xml:
riot --syntax=RDF/XML --validate example_datasets/ProvToolbox_playground/fmri_provenance.xml
riot
can convert RDF serialization formats.
This command generates N-Triples from a Turtle file.
riot --output=Turtle example_datasets/ProvToolbox_playground/fmri_provenance.ttl
riot
supports creation of inferred triples during the parsing process.
Output will contain the base data and triples inferred based on RDF subclass, subproperty, domain and range declarations.
We take the PROV onthology and our example data from ProvToolbox as example:
The onthology has to be in RDF/XML format. The filename has to end with .rdf
.
wget http://www.w3.org/ns/prov.owl
mv prov.owl prov.rdf
Now we can use he onthology for inference:
riot --rdfs=example_datasets/onthologies/prov.rdf example_datasets/ProvToolbox_playground/fmri_provenance.ttl
These statements are present in fmri_provenance.ttl
:
fmri:warp1 a prov:Entity ;
rdfs:label "warp1" .
fmri:align-warp4 a prov:Activity ;
rdfs:label "align_warp4" .
fmri:warp4 prov:wasGeneratedBy fmri:align-warp4 .
The PROV onthology defines the Property wasGeneratedBy
as:
<owl:ObjectProperty rdf:about="http://www.w3.org/ns/prov#wasGeneratedBy">
<rdfs:label>wasGeneratedBy</rdfs:label>
<inverse>generated</inverse>
[...]
<rdfs:subPropertyOf rdf:resource="http://www.w3.org/ns/prov#wasInfluencedBy"/>
[...]
</owl:ObjectProperty>
The above riot
command outputs the inferred triples:
<https://example.com/warp4> <http://www.w3.org/ns/prov#wasInfluencedBy> <https://example.com/align-warp4> .
riot
infers subproperties. As wasGeneratedBy
is a subproperty of wasInfluenceBy
, this triple is inferred.
There are other inference engines that can infer more triples. You may notice that the reverse property generated
is not inferred, because riot
does not infer inverse properties.
Load a TDB2 database, and expose, read-only, via Docker:
#cd docker/jena-fuseki-docker-4.9.0
mkdir -p docker/jena-fuseki-docker-4.9.0/databases/DB2
sudo docker run \
-it \
-v "$(pwd)"/example_datasets:/input \
-v "$(pwd)"/docker/jena-fuseki-docker-4.9.0/databases:/databases \
jena:latest \
tdb2.tdbloader --loc /databases/DB2 /input/fMRI_example_full/fmri_provenance.ttl
04:45:31 INFO loader :: Loader = LoaderPhased
04:45:31 INFO loader :: Start: /input/ProvToolbox_playground/fmri_provenance.ttl
04:45:31 INFO loader :: Finished: /input/ProvToolbox_playground/fmri_provenance.ttl: 141 tuples in 0.21s (Avg: 658)
04:45:31 INFO loader :: Finish - index SPO
04:45:31 INFO loader :: Start replay index SPO
04:45:31 INFO loader :: Index set: SPO => SPO->POS, SPO->OSP
04:45:31 INFO loader :: Index set: SPO => SPO->POS, SPO->OSP [141 items, 0.0 seconds]
04:45:31 INFO loader :: Finish - index OSP
04:45:31 INFO loader :: Finish - index POS
Start the Fuseki server and expose the database read-only:
cd docker/jena-fuseki-docker-4.9.0
# Without chaning permissions fuseki won't start. This is ugly, but in the end we just want to test things ...
sudo chmod -R 777 databases/
#sudo docker compose run --rm --name MyServer --service-ports fuseki --loc databases/DB2 /ds
sudo docker compose up
[2023-12-06 05:31:28] INFO Server :: Apache Jena Fuseki 4.9.0
[2023-12-06 05:31:29] INFO Server :: Database: TDB2 dataset: location=databases/DB2
[2023-12-06 05:31:29] INFO Server :: Path = /ds
[2023-12-06 05:31:29] INFO Server :: Memory: 2.0 GiB
[2023-12-06 05:31:29] INFO Server :: Java: 17.0.9
[2023-12-06 05:31:29] INFO Server :: OS: Linux 5.15.0-89-generic amd64
[2023-12-06 05:31:29] INFO Server :: PID: 1
[2023-12-06 05:31:29] INFO Server :: Start Fuseki (http=3030)
For writing to the database, use the --update
flag.
s-get http://localhost:3030/ds/data default
This return every stored triple:
s-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}'
An initial set of provenance-related queries is given below.
Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.
Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.
- Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
The URI of the Atlas X Graphic in the example is fmri:convert-x
. Let's get some information about that file. The next query shows every triple where fmri:convert-x
is the subject`
s-query --service http://localhost:3030/ds/query \
'PREFIX schema: <https://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fmri: <https://example.com/>
PREFIX scoro: <http://purl.org/spar/scoro/>
SELECT ?s ?p ?o
WHERE {fmri:convert-x ?p ?o .}'
{ "head": {
"vars": [ "s" , "p" , "o" ]
} ,
"results": {
"bindings": [
{
"p": { "type": "uri" , "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } ,
"o": { "type": "uri" , "value": "http://purl.dataone.org/provone/2015/01/15/ontology#Data" }
} ,
{
"p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#label" } ,
"o": { "type": "literal" , "value": "atlas_x.gif" }
} ,
{
"p": { "type": "uri" , "value": "https://schema.org/location" } ,
"o": { "type": "literal" , "value": "~/github/fMRI_snakemake/results/slices/atlas_x.gif" }
} ,
{
"p": { "type": "uri" , "value": "https://schema.org/sha256" } ,
"o": { "type": "literal" , "value": "882fe0286e2fae0c5c1e9f3420bb7da6004a6900c96ef8f018c2c44a7c8b
0a1a" }
} ,
{
"p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#qualifiedGeneration" } ,
"o": { "type": "bnode" , "value": "b0" }
}
]
}
}
We can see that fmri:convert-x
is the subject for a prov:qualifiedGeneration
predicate. We can get more information about the Generation of fmri:convert-x
by looking at the object of that triple:
s-query --service http://localhost:3030/ds/query \
'PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <https://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fmri: <https://example.com/>
PREFIX scoro: <http://purl.org/spar/scoro/>
SELECT ?generation ?p ?o
WHERE {fmri:convert-x prov:qualifiedGeneration ?generation .
?generation ?p ?o .}'
{ "head": {
"vars": [ "generation" , "p" , "o" ]
} ,
"results": {
"bindings": [
{
"generation": { "type": "bnode" , "value": "b0" } ,
"p": { "type": "uri" , "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } ,
"o": { "type": "uri" , "value": "http://www.w3.org/ns/prov#Generation" }
} ,
{
"generation": { "type": "bnode" , "value": "b0" } ,
"p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#activity" } ,
"o": { "type": "uri" , "value": "https://example.com/convert-exe" }
}
]
}
}
Now we have the activity convert-exe
that created our file fmri:convert-x
. Let's get details about the program used in that activity:
s-query --service http://localhost:3030/ds/query \
'PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <https://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fmri: <https://example.com/>
PREFIX scoro: <http://purl.org/spar/scoro/>
SELECT ?program ?p ?o
WHERE {fmri:convert-x prov:qualifiedGeneration ?generation .
?generation prov:activity ?execution .
?execution prov:qualifiedAssociation ?association .
?association prov:hadPlan ?program .
?program ?p ?o}'
{ "head": {
"vars": [ "program" , "p" , "o" ]
} ,
"results": {
"bindings": [
{
"program": { "type": "uri" , "value": "https://example.com/convert" } ,
"p": { "type": "uri" , "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } ,
"o": { "type": "uri" , "value": "http://purl.dataone.org/provone/2015/01/15/ontology#Program" }
} ,
{
"program": { "type": "uri" , "value": "https://example.com/convert" } ,
"p": { "type": "uri" , "value": "http://www.w3.org/ns/prov#label" } ,
"o": { "type": "literal" , "value": "convert" }
} ,
{
"program": { "type": "uri" , "value": "https://example.com/convert" } ,
"p": { "type": "uri" , "value": "https://schema.org/applicationSuite" } ,
"o": { "type": "literal" , "datatype": "https://schema.org/Text" , "value": "ImageMagick" }
} ,
{
"program": { "type": "uri" , "value": "https://example.com/convert" } ,
"p": { "type": "uri" , "value": "https://schema.org/citation" } ,
"o": { "type": "literal" , "datatype": "https://schema.org/URL" , "value": "ImageMagick Studio LL
C. (2023). ImageMagick. Retrieved from https://imagemagick.org" }
} ,
{
"program": { "type": "uri" , "value": "https://example.com/convert" } ,
"p": { "type": "uri" , "value": "https://schema.org/downloadUrl" } ,
"o": { "type": "literal" , "datatype": "https://schema.org/URL" , "value": "https://legacy.imagem
agick.org/script/download.php" }
} ,
{
"program": { "type": "uri" , "value": "https://example.com/convert" } ,
"p": { "type": "uri" , "value": "https://schema.org/softwareVersion" } ,
"o": { "type": "literal" , "datatype": "https://schema.org/Text" , "value": "6.9.10-23 Q16 x86_64
20190101" }
}
]
}
}
This shorter query should create a whole subgraph. The subgraph contains all information about the generation of fmri:convert-x
.
s-query --service http://localhost:3030/ds/query \
'PREFIX schema: <https://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX provone: <http://purl.dataone.org/provone/2015/01/15/ontology#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fmri: <https://example.com/>
PREFIX scoro: <http://purl.org/spar/scoro/>
CONSTRUCT { ?s ?p ?o }
WHERE {fmri:convert-x (prov:label|!prov:label)* ?s . ?s ?p ?o .}'
SPARQL queries can be performed from the command line:
# Example: https://jena.apache.org/tutorials/sparql_query1.html
sparql --data=vc-db-1.rdf --query=q1.rq
--------------------------------
| x |
================================
| <http://somewhere/JohnSmith> |
--------------------------------