/jbs-json23plet

A command-line tool for generating RDF triplets from Json input

Primary LanguageJava

json23plet

A command-line tool for generating RDF triplets from Jsons input

What is this?

json23plet is a linux command line tool to generate RDF triplets in TURTLE (.ttl) format from json files.
You can write your own generators for your specific json format files, according to your own ontology and run it through json23plet.

Background

This tool was devleoped as part of the JBS (Jewish Book Shelf) project in the TDK lab.

Table of contents

Getting started

Prerequisites

Install Maven on your machine.

apt-get install maven

Installation

  1. Git clone jbs-json23plet repository to a directory.

     git clone git@github.com:TechnionTDK/jbs-json23plet.git
    
  2. Go into jbs-json23plet/ and build the project.

     ./json23plet.sh -b
    

    This command will call mvn install.

  3. Untrack the configuration file from git.

     git update-index --assume-unchanged config.json
    

    This command will make your local config.json untracked from the GitHub repository.

Using the tool

Json23plet configuration

Json23plet has some configurations, all defined in config.json file.
The default configurations are:

  {
    "setting": {
      "globalSetting": {
        "genOutputDir": "output", // the output root dir
        "errorLevel": "low",
        "genInputDir": "" 
      },
      "generators": [] // configuration scheme to run multiple generators
    }
  }

commnd line commands

Build the tool

  ./json23plet.sh -b

This command will call mvn install.

Initialize the project directories

The command initializes the direcories tree of the project.

  ./json23plet.sh -init

This command creates ontologies/json, ontologies/ttl and src/test/testsFiles directories. During running, the tool assumes those directories exist.

Configure the output root directory

The command sets up the output root directory and saves it in config.json file.
The input directories tree will be reflected in this directory and will have an identical directory structure.

  ./json23plet.sh -config outputDir=myOutputDir

Configure the default input directory

The command sets up the default input directory and saves it in config.json file.

  ./json23plet.sh -config genInputDir=myInpuputDir

Run a single generator

The command runs a specific generator on a specific input directory (or a specific single file) recursively.

  ./json23plet.sh -generate generatorName dataInputRootDir

or

  ./json23plet.sh -generate generatorName inputFile.json

If you are using the basic generator, specify basic as the generator name.

  ./json23plet.sh -generate basic dataInputRootDir

This command can also be used without specifying the input directory.

  ./json23plet.sh -generate generatorName

In this case it runs generatorName on the default input directory that was configured.

Run multiple generators

Json23plet enables you to configure a scheme of running multiple different generators on diffrent directories (or files).
To do so, you need to configure your scheme and run this command:

  ./json23plet.sh -generateAll

Create and update an ontology

json23plet uses the same logic to generate ontology.ttl files. Therefore, to generate a new ontology it's required to define a json file with the ontolgy and run the json23plet ontology generator.

  1. Define the ontology in myOntology.json by using this format.

  2. To create the ontology run:

         ./json23plet.sh -ontology myOntology
    

    A myOntology.ttl file will be created in ontologies/ttl.
    A generated class myOntology.java will be created in src/main/java/json23plet/ontologies. This class represents the ontology as a java object which enables you to use it in a convenient way.

  3. Rebuild using:

         ./json23plet.sh -b 
    

Note: While changing an existing ontology, there might be some generators that use the old ontology.java properties. Therefore, be aware of compilation errors.

Add and edit configuration for a generator

When running the generateAll command, the tool will run all the configured generators in the config.json file.
For example, to add(remove) a generator named myGen to the scheme of generateAll use:

    ./json23plet -config  genName=myGen inputPath=MyGenInputPath active=true(false)

This command will add the following lines to the config.json file:

   {
      "generators": [
                        {
                            "genName": "myGen",
                            "inputPath":"myGenInputPath",
                            "active": "true"("false")
                        }
                    ] 
  }

Be aware that it is possible to update the schema manually. For more infomartion, see setting.

Edit configuration of global setting

json23plet uses some settings while running.

  • To configure output dir, run:

          ./json23plet -config outputDir=myOutputDir
    
  • To configure error level, run:

          ./json23plet -config errorLevel=level
    

Testing

json23plet has a simple and efficient testing framework. To test your generator:

  1. Add myGenerator.input.json to src/test/testsFiles.
  2. Add myGenerator.expected.ttl to src/test/testsFiles.

To run the tests, rebuild the project using:

  ./json23plet.sh -b 

maven will run the tests.
It is also possible to run the tests without rebuild as explained in this guide.

Developer guide

Table of contents

Development enviorment

It's preferred to develope and use the tool on a Linux machine where you can run the tool through the command line.
Working on Windows requires an IDE.

Devloping with an IDE

It's recommeded to use IntelliJ IDEA, and our guide will focus on it.

Getting started with Intellij

  • Clone the repository
    Go to File->New->Project From Version Control->GitHub. In the repository field, enter git@github.com:TechnionTDK/jbs-json23plet.gitand click Clone.

  • Configure tool arguments

    1. Right click src/main/java->Mark Directory as->Sources Root.
    2. Build the project
    3. Right click src/main/java/Json23plet.java->Create 'Json23plet.main()' and configure the tool arguments.
    4. Run or debug the tool.

Development and maintenance

In this section, we will review the code components for future maintenance.
See the code documentation.

Overview

json23plet is an engine which runs user defined generators.
It uses reflection and the Apache jena library and loads file statically.
By specifing a generator name, the tool uses reflection mechanism to activate the generator on the input directory.
The tool contains some modules which simplify working with it.

Ontologies

Source code:

  src/main/java/json23plet/generators/ontologyGenerator/OntologyClassGenerator.java
  src/main/java/json23plet/generators/ontologyGenerator/OntologyGenerator.java
  src/main/java/json23plet/generators/ontologyGenerator/OntologyTTLGenerator.java 

As mentioned before, json23plet generates ontologies in the same way of regular json data. The tool contains a special ontology generator which has two purposes:

  1. Create the ttl files of the ontology.
  2. Create a java class which will enable you to reference your ontology definitions as java objects.

The generator requires the ontology json to be in a very specific format:

{
    "prefixes" : [
        {
          "prefix" : "jbo",
          "uri" : "http://jbs.technion.ac.il/ontology/"
        },
        {
          "prefix" : "jbr",
          "uri" : "http://jbs.technion.ac.il/resource/"
        },
        ...
      ],
      "metadata" : [
        {
          "uri" : "jbo:Tanach", //<-----requierd predicate//
          "rdf:type" : "owl:Class", //<----required predicate//
          "rdfs:label" : "Tanach",
          "rdfs:subClassOf" : "owl:Thing"
          ...
        },
        ...
    ]
}

(The uri and rdf:type predicates are required).

Usage example

  • Drop your myOntology.json file in ontologies/json and run:

      ./json23plet -ontology myOntology
    

    After generating, an myOntology.ttl file will be created in ontologies/ttl.
    Note: Do not remove this file, because json23plet uses it to load the ontology during generating new ttl files in the project.

    The myOntology.java class file will be created in src/main/java/json23plet/ontologies. This file contains some definitions of the ontology and can be used during writing a new generator.

    Note: After changing an existing ontology there might have some generators which are still using the old ontology definitions. If so, they need to be updated or they might cause compilation errors.

  • Rebuild the project using:

      ./json23plet.sh -b
    

Json

Source code file:

  src/main/java/json23plet/modules/Json.java

Json is a module for parsing json files.
json23plet engine assumes that you use this module, but it's not a mandatory.
The engine loads statically (per thread) the current working json file, and it can be accessed (in the generator) as a parsed json through Json module.

Usage example

Json
.json() // return the current working file already parsed.
.getAsSomeObject("property") // read the code documentation for more details.

Triplet

Source code file:

  src/main/java/json23plet/modules/Triplet.java

The Triplet module is a simple wrapper for the Apache Jena library.
json23plet engine loads an RDF model before calling the generator and enables adding statements to that model by using the Triplet module.

Usage example

Triplet
.triplet()
.subject("subjectUri")
.preficate(Predicate) // taken from Ontology.java class might be also uri.
.object(Resource) // taken from Ontology.java class might be also uri.

Regex

Source code file:

  src/main/java/json23plet/modules/Regex.java

A simple implementation of wrapper to java regex

Usage example

  regex(a.*) // load the regex
  .match("abc") // check if the string match to the regex.

DataPublisher

Source code file:

  src/main/java/json23plet/modules/DataPublisher.java

A simple module to publish RDF model into ttl file

As described, the output directory reflects the input directory. The tool initiates the DataPublisher before calling the generators.

Generators

As mentioned, json23plet runs a specific generator given by the generator name. This enables generating new RDF triplet filed from json files.

Create and install a new generator

json23plet enables you to write and deploy your own generator for unique json formats.
To do so:

  • Create your generator and drop it in src/main/java/json23plet/generators/customGenerators directory.

  • For regexGenerator, drop it in src/main/java/json23plet/generators/regexGenerators directory.

  • Rebuild the project using:

      ./json23plet.sh -b
    
  • Now you can run your generator.

Usage example

  1. Write your own generator. A generator needs to extend Generator class and implement the generate method.
    The generate code typically looks like this:

         for (Json j : json().getAsArray(propertyOfJsonArray)) {
               triplet()
                     .subject(j.value("uri"))
                     .predicate(j.value("key"))
                     .object(s); // s is an object in ontology.java class
         }
    

    (In this example, we loaded the parsed json file and for each json in the list we create one triplet).

  2. Drop the generator.java in src/main/java/json23plet/generators

  3. Rebuild the tool by using:

         ./json23plet.sh -b
    

The basic generator

Source code file:

  src/main/java/json23plet/generators/customGenerators/BasicJsonGenerator.java

To simplify using the tool and to avoid creating a new generator for each type of json file, the tool has a BasicJsonGenerator.
This generator assumes the json file has a specific format, and by activating it on this file it will generate triplets generically without any more knowledge about the json file format.

Basic Json files format

To use the basic generator, you need to create the json file in the following format:

{
    "subjects" : [
        { "uri" : subjectUri,
          Property1 : Object1, // the object can be also list of objects : [object1, object2, ...] ,
          Property2 : Object2, // the object can be also list of objects : [object1, object2, ...] 
          ...
        },
        ....
    ]
}

Usage example

Simply run:

  ./json23plet.sh -generate basic inputDir

RegexGenerator

Source code files:

  src/main/java/json23plet/generators/regexGenerators/BaseRegexGenerator.java
  src/main/java/json23plet/generators/regexGenerators/IRegExGenerator.java

The basic generator is a powerful and generic tool for any data you wish to generate, but sometimes some jsons need special treatment.
One possible solution is to write a special generator that will handle those cases, but the tool has an easier way to do it by using this module.
Let's start with an example:
Assume we want that every json object whose uri starts with "jbr":"tanach" to have a triplet with rdf:type, jbo:Tanach triplet. The regexGenerator framework enables you to do this easily.
A regex generator checks for every json object in the input if it matchs to the rule, and if so, it adds the appropriate triplet to the model.

Build a regex generator

  1. Build myRegexGeneraor class, the class needs to extend the BaseRegexGenerator class.

  2. Implement the abstract methods as described in the code documentation.

  3. Drop your generator in src/main/java/json23plet/generators/regexGenerators and rebuild the tool by:

         ./json23plet.sh -b
    
  4. Run the generator in the regular way.

JsonValidator

Source code files:

  src/main/java/json23plet/JsonValidators/JsonValidator.java
  src/main/java/json23plet/JsonValidators/IJsonValidator.java

To validate the json input files before generating triplets from them, the tool contains a framework that enables defining and implementing any validation in a simple way. Its mechanism is much like the regexGenerators.

Build a jsonValidator

  1. Create myValidator class, the class has to extend the JsonValidator class.
  2. Implement the abstract methods as describe in the code documentation.

Error level

The action that the tool operates on error detection depends on errorLevel value (which is defined in config.json):

  • low - ignore

  • medium - display the error

  • high - stop running

To set the error level, see here.

Usage example

A typical usage looks like this:

  public void generate() {
    JsonValidator v = new PsukimTagsValidator();
    v.registerValidators();
    try {
        v.validateSingleJson(Json.json());
    } catch (JsonValidator.JsonValidatorException e) {
        e.printStackTrace();
    }
    for (Json j : json().getAsArray("subjects")) {
        String subjectUri = j.value(URI);
        for (Json tag : j.getAsArray("tags")) {
            triplet()
                    .subject(subjectUri)
                    .predicate(JBO_P_MENTIONS)
                    .object(tag.value(URI));
        }
    }
    DataPublisher.publish("", "." + getID() + ".ttl", "TURTLE");
}

In the example, we validate our json file before generating it.
As described in the code documentation, you can also validate only one json object each time.