Software Engineering Project Uni Leipzig WS 2017/18
DBpedia is one of the largest, open and freely accessible knowledge graphs about entities (such as movies, actors, athletes, politicians, places, ...) of our everyday life. Information contained in Wikipedia articles are extracted into DBpedia as RDF, using mappings between Wikipedia infoboxes and the DBpedia Ontology to become accessible for complex requests. Accessing DBpedias knowledge requires the use of SPARQL, the data query language for RDF databases.
This API provides a REST-conform interface. It is intended to be used by web developers who want easier DBpedia access. The API uses the DBpedia Ontology and transforms HTTP requests into SPARQL queries, which are then sent to the DBpedia endpoint. The results can be returned in a range of formats, such as JSON, JSON-LD, TSV and styles (e.g. nested JSON).
Additional features of the API:
- Versioning (e.g. after an update of the ontology or if labels of DBpedia resources changed)
- Easy access control through API keys
- Caching
- Logging
- Windowing
- Documentation through Swagger UI
This README provides a brief overview and introduction for the administrator of the API to the software and its usage.
Further documentation:
- Research report for a basic grasp about discussed topics and technologies (DBpedia, RDF, Spring, ...)
- Requirements specification for a more precise specification of all features
- Software design descriptin for an understanding of the source code
It is required that maven
and at leastopenjdk 8
are installed. Other Java versions, such as JDK 9, may work as well.
If the curl client is intended to be used for testing purposes, then curl
and a shell are also required.
To install the projects straight from the git repository, you'll also need git
.
Arch Linux:
pacman -S jdk8-openjdk maven curl git
The jar
package is already included in the release package, so this step can be skipped and you can continue with step 4.
Otherwise, you'll need to clone the repo and build the project with maven.
git clone https://github.com/dbpedia/ontology-driven-api dbpedia-rest-api
cd dbpedia-rest-api
mvn clean package
This listing creates a jar file named target/dbpedia-api-1.0.0.jar
, which includes all dependencies needed for execution.
Therefore, it can be copied as you wish. But keep in mind to copy the folder config
, where by default external resources and config files are saved.
(The path to those files can also be configured, skip to section Configuration for more information.
To start the API, type
java -jar target/dbpedia-api-1.0.0.jar
into a shell. The API will run on port 8080, which can be verified on the info page, which can be accessed on http://localhost:8080/api/info
.
You can get an overview about the functionality with Swagger-UI through http://localhost:8080/swagger-ui.html
.
Swagger is also used as external documentation for users of the API.
A shell script is available at dev/curl-em-all.sh
for more request examples or to generate and run all types of requests.
You will need a shell to run it.
To display an overview about the functionality, view the help page with ./curl-em-all.sh -h
.
>>> Curl client for dbpedia rest api V1.0.0 <<<
Usage:
-f [TSV|RDF|JSONLDF] for format
-p [PREFIX|SHORT|NESTED] for prettyfication
-s to walk you through every api-call (s stands for slow)
-v [version] for setting the API version
-k [key] sets the API Key
For example, ./curl-em-all.sh -s
is suitable for a step by step presentation of all functionality specified in the requirements specification.
Configuration files and external resources (such as the DBpedia ontology) are placed in the folder target/config
by default.
This folder contains the following files:
config
├── application.properties
├── dbpedia_2016-10.owl
├── keys.txt
├── mapped_properties_per_class.json
├── prefixes.json
└── versions
└── 1_0_0.version.json
The file target/config/application.properties
is used to configure various parameters from the API, including:
- server.port: Port to access the API
- dbpedia.sparqlEndpoint: SPARQL endpoint to send requests to
- ontology.file: Path to DBpedia ontology; this has to be replaced upon DBpedia update
- window.maxWindowLimit: Maximum value of the windowing query parameter
- prefixes.file: Path to the file containing namespace prefixes
- versions.dir: Directory with version files
- uri.path URI path to access the API
localhost:8080/[uri.path]/
- spring.cache.type=NONE (optional) to switch off the cache
- keys.usingKeys Toggle usage of API keys
- keys.file: Path to the file containing the API keys (see below)
- keys.startQuotaDay, keys.startQuotaHour and keys.startQuotaMinute: Sets usage quotas for all users
All file paths need to be given relative to the .jar file. Parameters can also be provided as command line argument:
java -jar target/dbpedia-api-1.0.0.jar --server.port=4200
Access to the API is restricted by API keys in order to limit the number of requests per user during specific time intervals (minute, hour, day). A list of all allowed keys are set up in config/keys.txt
. For every key (= user), a statistic is created that keeps track of the quotas. This data is not persistent after a restart.
If a key ends with _ADMIN
, a user gets unlimited (MAX_INT) quotas.
The key is provided in the URI, using the required parameter &key=
.
This file contains all supported prefixes and assigns namespaces to them. By default, prefixes from prefix.cc are used. Names of prefixes and namespaces can be changed, but it is recommended to create a new API version afterwards (see below). Important: It is strongly recommended not to change the prefixes dbo, dbp, dbr, rdf and rdfs.
Contains important properties of various classes for a RML request. Classes are saved as an array, with all associated properties. Properties have to be of format "prefix:propertyName". The identifier for class and property names within the JSON file is "@id".
If the DBpedia ontology changes, it is likely that an API update is needed, because entity names ("identifiers") might have changed. To make requests from older versions possible, patch files are used. They define the replacement of old identifiers in requests.
First, the ontology from DBpedia has to be downloaded and the field ontology.file
in application.properties
must be updated.
It is optional to create a new patch file in config/versions
. All files in this folder that end with version.json
are loaded
on startup.
An example patch file config/versions/v1_1_0.version.json
looks like this:
{
"major": 1,
"minor": 1,
"patch": 0,
"resourceReplacements": [
{
"prefixBefore": "dbp",
"identifierBefore": "numOfEmployees",
"prefixNow": "dbp",
"identifierNow": "numberOfEmployees"
},
{
"prefixBefore": "dbp",
"identifierBefore": "some-property-that-does-not-work-anymore",
"prefixNow": "foaf",
"identifierNow": "surname"
}
],
"prefixReplacements": {
"foaff": "foaf",
"old-dbo": "dbo"
}
}
Explanation:
- major, minor and patch define the version number
- resourceReplacements: An array, which contains replacements of resources within DBpedia. With the file above, dbp:numOfEmployees is replaced with dbp:numberOfEmployees and dbp:some-property-that-does-not-work-anymore with foaf:surname.
- prefixReplacements: An object that contains prefix replacements which are applied to all resources with this prefix. foaff is replaced with foaf and old-dbo with dbo.
The API follows the principle of Semantic Versioning, i.e. the Major version should only be changed if there are incompatible changes in the ontology.
Requests to incompatible versions have to contain &oldVerion=true
in the URI.
If not, error code 400 is returned.
The API logs every request to the directory logs
. The logfile is a simple textfile that is archived daily. This may look like this:
logs
├── archive
│ ├── rollingfile.log.2018-03-22.gz
│ ├── rollingfile.log.2018-03-24.gz
│ ├── rollingfile.log.2018-03-25.gz
│ ├── rollingfile.log.2018-03-28.gz
│ ├── rollingfile.log.2018-04-03.gz
│ ├── rollingfile.log.2018-04-04.gz
│ └── rollingfile.log.2018-04-07.gz
└── logfile.log
The following information is logged in TSV:
- the SPARQL query of the request
- quotas of keys
- duration of response (number of characters)
- duration of response
- occurred errors
ehcache2
is used to cache responses from the SPARQL endpoint. The cache is saved within the local swap, so it will be rebuilt after restart.