/themis-export-service

Microservice that exports data from Kaleidos to be published on Themis

Primary LanguageJavaScriptMIT LicenseMIT

Themis export service

Microservice that exports data from Kaleidos to be published on Themis. An export can be triggered manually or scheduled via a publication-activity in Kaleidos.

Getting started

Add the export service to your stack

Add the following snippet to your docker-compose.yml:

  themis-export:
    image: kanselarij-vlaanderen/themis-export-service
    links:
      - database:database
    volumes:
      - ./data/exports/:/share

The final result of the export will be written to the volume mounted in /share.

How-to guides

How to trigger an export

An export can be triggered in 2 ways:

  1. By manually calling the API endpoint for a specific meeting
  2. By creating a themis-publication-activity in the Kaleidos DB with a (planned) start date and related to a meeting

Reference

Configuration

The following environment variables can be configured:

  • MU_SPARQL_ENDPOINT (default: http://database:8890/sparql): mu-authorization based SPARQL endpoint of the internal triple store to write intermediate results to that will trigger delta notifications
  • VIRTUOSO_SPARQL_ENDPOINT (default: http://triplestore:8890/sparql): SPARQL endpoint of the Virtuoso triple store, in order to perform fast queries skipping mu-authorization and to extract the ttl files
  • EXPORT_BATCH_SIZE (default: 1000): number of triples to export in batch in the final dump
  • PUBLICATION_CRON_PATTERN (default 0 * * * * * = every minute): frequency to fetch for scheduled publications in Kaleidos
  • PUBLICATION_WINDOW_MILLIS (default: 24h): max window to fetch publication-activities in Kaleidos for. The window determines the period in which scheduled publications will still be executed (in delay) if this export service is not running at the moment the publication was originally planned.
  • NB_OF_VIRTUOSO_QUERY_RETRIES (default 6): max number of times to retry a query to Virtuoso. Sometimes Virtuoso is busy creating a checkpoint. If we make a request at this time, it will fail, to remedy this, we retry failed requests a number of times
  • VIRTUOSO_QUERY_RETRY_MILLIS (default 1000): the timeout in milliseconds between query retries

Model

Used prefixes

Prefix URI
dct http://purl.org/dc/terms/
adms http://www.w3.org/ns/adms#
prov http://www.w3.org/ns/prov#
ext http://mu.semte.ch/vocabularies/ext

Public export job

Resource representing an export job. Jobs are executed one by one using the FIFO approach. When an export job fails, it will be retried up to 5 times before being permanently marked as a failed job.

Class

ext:PublicExportJob

Properties
Name Predicate Range Definition
status adms:status rdfs:Resource Status of the export job, initially set to <http://data.kaleidos.vlaanderen.be/public-export-job-statuses/scheduled>
meeting prov:used rdfs:Resource Meeting (in Kaleidos) the export job is executed for
created dct:created xsd:dateTime Datetime of creation of the job
scope ext:scope xsd:string Scope of the export jobs. Possible values are newsitems and documents. A job may contain multiple scopes.
results prov:generated rdfs:Resource The resources generated by the export job.
source dct:source rdfs:Resource Source of the export job (e.g. a publication-activity in Kaleidos)

Export job statuses

The status of the export job will be updated to reflect the progress of the job. The following statuses are known:

Themis publication activity

Resource respresenting a planned publication for a meeting. This resource resides in the Kaleidos DB.

Class

ext:ThemisPublicationActivity

Properties
Name Predicate Range Definition
meeting prov:used rdfs:Resource Meeting (in Kaleidos) the publication is planned for
planned-start prov:startedAtTime xsd:dateTime Datetime on which the publication in scheduled.
scope ext:scope xsd:string Scope of the publication. Possible values are newsitems and documents. A publication may contain multiple scopes.

Exported data

The data model used for the exported data is documented on the Themis documentation website.

API

POST /meetings/:uuid/publication-activities

Trigger the publication of the Kaleidos meeting with the given :uuid. In case the meeting has already been published before, the new publication will be linked to the previous one on Themis.

Request

Example request body:

{
  "data": {
    "type": "publication-activity",
    "attributes": {
      "scope": ["newsitems", "documents"],
      "source": "http://themis.vlaanderen.be/publication-activity/933ea4cc-3786-4a5a-bace-8c99ce8c44aa"
    }
  }
}

The following attributes can be set on the publication-activity:

  • scope: determines the scope of the export. Supported values are "newsitems" and "documents". Documents can only be exported if the newsitems are exported as well. To unpublish a meeting, send an empty array as scope.
  • source (optional): URI of the publication-activity in Kaleidos that triggered the export
Response
  • 202 Accepted on successfull trigger of an export. The Location response header contains the endpoint to monitor the progress of the job.
  • 400 Bad Request on invalid scope in the request body
  • 404 Not Found if a meeting with the given id cannot be found in Kaleidos

GET /public-export-jobs/:uuid

Get the details, including the status, of an export job

Response
  • 200 OK with job details in the response body
  • 404 Not Found if a job with the given id cannot be found

GET /public-export-jobs/summary

Get a summary of the triggered export jobs. Contains the number of export jobs, grouped per status.

Response
  • 200 OK with the summary in the response body