Semantic adapter component performs data transformation from non-semantic to semantic data. In PLATOON architecture context, semantic adapter would be connected to data connectors (ingestion point of data into PLATOON from various sources) to receive non semantic data, would do the data transformation to semantic data and pass on transformed semantic data to upstream components.
Main functionalities provided by the component
- Data ingestion with provenance for each data connector
- Data transformation rules definition for each source (through SparqlGenerate)
- Data transformation from non-semantic to semantic data as part of data flow
Semantic adapter design is based on NiFi that is opensource Apache project for data flow. NiFi’s fundamental design concepts closely relate to the main ideas of Flow Based Programming [fbp] and hence is a natural choice of extension for development of semantic adapter targetted in PLATOON reference architecture.
As a directed data flow graph, built-in processors can be used for various data cleaning/transformation purposes and in the pipeline, semantic adapter processor
can be used as deemed necessary with respective sparql-generate
query (sample data and flow is discussed in below sections with sparql-generate query)
Semantic adapter processor;
- Accepts
json
orcsv
or other formats as accepted data input - A
sparql generate
query with respect to data received in flowfile - Output format configuration - as of now three output formats are supported
JSON-LD
,TURTLE
andRDF/XML
in most cases under PLATOON context,JSON-LD
will be used
In this repository you will find source code of NiFi processor that can be built and installed in a running Apache NiFi instance; follow below steps if you are starting with fresh installation
-
Download Apache NiFi binary latest release
-
Run nifi to make sure that it works (requires respective version of Java installed)
<nifi-home-directory>/bin/nifi.sh start
-
Clone this repository and build
# cd <this-repository-directory> # mvn clean package
-
Copy processor generated
nar
filenifi-sparqlgenerate-nar-1.0.nar
to nifilib
folder# cp <this-repository-directory>/nifi-sparqlgenerate-nar/target/nifi-sparqlgenerate-nar-1.0.nar <nifi-home-directory>/lib //restart nifi so that new processor is loaded # <nifi-home-directory>/bin/nifi.sh restart //check logs to see if there are any errors while running the new processor # tail -f <nifi-home-directory>/logs/nifi-app.log
A sample flow with data and sparql generate query is made part of this repository and is available under resources/sample-flow
folder. Considering below array of data being received from source
[...
{
"venue": {
"latitude": "51.0500000",
"longitude": "3.7166700"
},
"location": {
"continent": " EU",
"country": "BE",
"city": "Paris"
}
}
...]
a sparql-generate query will need to be written to associate semantic ontologies with this source data;
base <http://example.com/>
PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX fn: <http://w3id.org/sparql-generate/fn/>
PREFIX country:<http://loc.example.com/city/>
PREFIX schema: <http://schema.org/>
PREFIX wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX gn: <http://www.geonames.org/ontology#>
GENERATE {
?s a schema:City .
?s wgs84_pos:lat ?latitude .
?s wgs84_pos:long ?longitude .
?s gn:countryCode ?countryCode .
}
WHERE {
BIND(fn:JSONPath(?mainsourcemessage, "$.location.city" ) AS ?city)
BIND(fn:JSONPath(?mainsourcemessage, "$.venue.latitude" ) AS ?latitude)
BIND(fn:JSONPath(?mainsourcemessage, "$.venue.longitude" ) AS ?longitude)
BIND(fn:JSONPath(?mainsourcemessage, "$.location.country" ) AS ?countryCode)
BIND (URI(CONCAT("http://loc.example.com/city/",?city)) AS ?s)
}
and a directed data flow will be created that first splits JSON array to single data elements and then apply sparql-generate query on each to generate semantic data.
- Data flows are directed graph and its possible to use multiple semantic adapter processors in a single data flow depending on each aggregated or raw type data to be transformed
- Scaling of semantic adaptation depends inherently on clustering capacity of apache nifi and hence is able to handle high velocity and volume of data as part of data flow
- Even though semantic adapter based on nifi is suitable for most of the scenarios but for bulk data one time transformations, a separate
semantic adapter bactch
is being developed and can be extended and provided for use for interested partners