This repository contains basic settings for Pilot 2A Knowledge Graph.
mappings
- contains RML mappings of available Pilot 2A data sources using thePLATOON
semantic data model v? .scripts
- contains scripts used for transforming sources to RDF and loading it to triple store (Virtuoso) -virtuoso-script.sh
- used to remotely connect and load data usingisql-v
tool of virtuoso on command line -load_to_virtuos.py
- used to load the transformed RDF data to virtuoso using thevirtuoso-script.sh
script -transform_and_load.py
- performs both transforming raw data to RDF and loading it virtuoso using thevirtuoso-script.sh
scriptconfig.ini
- configuration file for materializing the Knowledge Graph using SDM-RDFizer.docker-compose.yml
- docker compose setup for transforming data to RDF and load it toVirtuoso
triple store.
Edit config.ini
file as follows:
Set the main directory in the [default]
section
[default]
main_directory: ./
Main directory for this setting is now, current folder.
[datasets]
number_of_datasets: 8
output_folder: ${default:main_directory}/rdf-dump
all_in_one_file: yes
remove_duplicate: yes
name: pilot2a-observation-data
dbtype: mysql
In datasets
section, you can set global parameters such as the number of datasets, output folder, how dump file should be created, name of the dump file (if dump is set to be saved in one file), database type, etc.
number_of_datasets
- how many datasets to transforms (create an rdf dump for). e.g., 8output_folder
- sets where the transformed dump to be saved, e.g.,${default:main_directory}/rdf-dump
all_in_one_file
- takesyes
orno
values, and sets whether to put all datasets in one file or in a separate file names. If setyes
, then thename
parameter will be used as the name of the file, i.e., "$(name)".nt and stored inoutput_folder
remove_duplicate
- takesyes
orno
values, and sets wherther duplicates should be removed while generatting RDF triples from single source with different tables/files or multiple data sources that might have duplicate values while applying the transformation.name
- sets the name of output RDF dump, ifall_in_one_file
parameter is set toyes
.dbtype
- sets the data source type, e.g.,csv
,mysql
,postgres
,json
,xml
, etc.
Once the [default]
and [datasets]
section is configures, then you need to put as many dataset specific setting as the number_of_datasets
specified in [datasets]
section.
For example, the first dataset, wind_farm_properties
, is configured as follows
[dataset1]
name: pilot2a_wind_farm_props
user:root
password:1234
host:192.168.0.2
port:3306
db:platoon_db
mapping: ${default:main_directory}/mappings/Wind-Farm/wind-farm.ttl
Note on the dataset number, [dataset1]
, dataset number 1 out of 8 datasets in this configuration. Each dataset will have its own configuration param values. In the snippet above, we set the name of dataset (this name will be used if all_in_one_file
param of the global param is set to no
). Other settings include: user
and password
- user name and password of the user to access the database, host
, port
, and db
- hostname, port and database name of the dataset (dataset1->wind_farm_props), and finally the mapping
param specifies where the RML mapping rules file is located. RML mapping rules need to conform the RML Spec.
There need to be at least 8 unique names of [dataset_n]
sections specifiying the parameters of each datasets in this configuration.
Run rdfizer
tool to create the RDF dump according to the above configuration and mapping files included in this config.
- Install
SDM-RDFizer
python3 -m pip install rdfizer
- Then run
rdfizer
script
cd Pilot2A-Data-Integration/
python3 -m rdfizer -c config.ini
This will create the RDF dumps according the configuration file, config.ini
.
- Run the docker compose file included in this repository.
(Prerequisite: Docker-ce, Docker-compose)
docker-compose up -d
- Then run
rdfizer
script and load data to virtuoso
- Transform data
The docker container created above using the docker-compose.yaml file will attach this repository as volume at /data
endpoint. So running rdfizer
script as follows will yield the same result as Option 1
above.
cd Pilot2A-Data-Integration/
docker exec -it sdmrdfizer python3 -m rdfizer -c /data/config.ini
This will create the RDF dumps according the configuration file, config.ini
, and store the RDF dump in /data/
volume, which in turn in "Pilot2A-Data-Integration/".
You can find the raw RDF file in .nt
serialization inside
- Load the RDF dump to Virtuoso
To load the generated RDF dump in step 2, we will use a script included in /data/scripts/
folder as follows:
docker exec -it sdmrdfizer python3 /data/scripts/load_to_virtuoso.py
OR to stransofrm and load data automatically, run the following:
docker exec -it sdmrdfizer python3 /data/scripts/transform_and_load.py -c /data/config.ini
transform_and_load.py
script performs the transformation step and loading to virtuoso after the transformation is performed.
Before running this, make sure you update the environmental variable in the docker-compose.yml
file as follows:
environment:
- SPARQL_ENDPOINT_IP=pilot2akg
- SPARQL_ENDPOINT_USER=dba
- SPARQL_ENDPOINT_PASSWD=dba
- SPARQL_ENDPOINT_PORT=1116
- SPARQL_ENDPOINT_GRAPH=http://platoon.eu/Pilot2A/KG
- RDF_DUMP_FOLDER_PATH=/data/rdf-dump
- Open http://localhost:8891/sparql on your browser
For example, write the following query to see the available classes (Concepts) in this endpoint:
SELECT DISTINCT ?Concept
WHERE {
GRAPH <http://platoon.eu/Pilot2A/KG> {
?s a ?Concept
}
} LIMIT 1000